Organisational Prompts

Nonaka: Making Knowledge Explicit

Justin Arbuckle — Mon, 01 Jun 2026 07:01:26 GMT

The previous article argued that organisations cannot specify what they cannot articulate, and that Argyris’s defensive routines ensure the most important knowledge stays undiscussable. That was the diagnosis. This article provides the positive theory: how knowledge actually moves from the tacit to the explicit, what the conversion requires, and why AI makes the conversion simultaneously more urgent and more difficult.

Ikujiro Nonaka and Hirotaka Takeuchi, studying successful Japanese innovators in the 1980s and 1990s, built a model of organisational knowledge creation that answers a question the Deciding phase has been circling since Evans: if the domain expert knows how the business works but cannot write it down, what is the process by which what they know becomes something an AI can act on? Their answer is the SECI model, and its most important claim is that the conversion is not a documentation exercise. It is a creative act that requires specific conditions, specific interactions, and specific kinds of leadership. Most organisations provide none of them.

A footnote before we begin: Nonaka and Takeuchi also wrote “The New New Product Development Game” in 1986, the Harvard Business Review article that introduced the rugby metaphor for overlapping development phases. That article inspired Jeff Sutherland and Ken Schwaber to name their framework Scrum. The thinkers who gave us the theory of knowledge creation also, almost accidentally, gave us the most widely adopted agile methodology. The connection is not coincidental: both contributions rest on the same insight, that knowledge is created through iterative, cross-functional interaction, not through sequential, specialised handoffs.

1. The Tacit-Explicit Distinction: We Know More Than We Can Tell

Nonaka and Takeuchi built on Michael Polanyi’s foundational observation: “we can know more than we can tell.” Tacit knowledge is personal, context-specific, acquired through experience, and deeply rooted in action. The domain expert who can price a complex insurance risk in seconds is deploying tacit knowledge. The experienced developer who looks at a system design and senses it will not scale is deploying tacit knowledge. The leader who walks into a team and feels something is wrong is deploying tacit knowledge. In each case, the knowledge is real, valuable, and almost entirely inarticulate.

Explicit knowledge is the opposite: articulated, systematic, codified, and easily communicated. A specification is explicit knowledge. A policy document is explicit knowledge. An API contract is explicit knowledge. Western organisations, Nonaka and Takeuchi argued, are systematically biased toward the explicit. They invest in documentation, processes, knowledge management systems, and databases. They treat knowledge as something to be captured, stored, and retrieved. And they consistently undervalue the tacit knowledge that makes the explicit knowledge meaningful.

The connection to Argyris is precise. The gap between espoused theory and theory-in-use is the gap between explicit and tacit knowledge. The espoused theory (the process document, the policy manual, the specification) is explicit. The theory-in-use (how people actually work, the workarounds, the informal rules) is tacit. Argyris explained why the gap exists: defensive routines prevent articulation. Nonaka provides the process model for closing it.

The connection to Bourdieu is equally precise. Habitus is tacit knowledge in Nonaka’s terms: the embodied dispositions that generate practice below conscious awareness. The domain expert’s pricing judgment is habitus made productive. The organisation’s resistance to change is habitus made defensive. In both cases, the knowledge governs practice without being available for examination. Simon’s decision premises are the mechanism: the tacit knowledge shapes the premises that enter decisions without anyone noticing, because nobody has made the premises explicit.

2. The SECI Model: Four Conversions

Nonaka and Takeuchi’s central framework describes four modes of knowledge conversion, forming a continuous spiral.

Socialisation (tacit to tacit): knowledge shared through direct experience. Observation, imitation, practice, shared activity. This is apprenticeship: the junior developer who learns how to review code by sitting next to a senior developer. The domain expert who absorbs how the business works by spending years inside it. Socialisation is slow, requires physical or at least sustained proximity, and produces knowledge that remains tacit. It cannot be replaced by documentation.

Externalisation (tacit to explicit): the critical and most creative conversion. This is where tacit insights are made explicit through dialogue, metaphor, analogy, and conceptualisation. The famous Matsushita bread-maker example: an engineer named Ikuko Tanaka apprenticed herself to a master baker at the Osaka International Hotel because he could not articulate what made his bread exceptional. She noticed his distinctive twisting motion when kneading dough and translated that physical observation into a specification for the bread machine’s kneading mechanism. The observation was socialisation. The translation into a specification was externalisation. Without both, the bread machine would not have worked.

This is what specification writing demands. The domain expert knows how the business works. The specification writer must convert that tacit knowledge into explicit requirements that an AI can act on. If the conversion is done badly, the specification captures the espoused theory (what people say the business does) rather than the theory-in-use (what it actually does). Argyris explained why the conversion fails. Nonaka explains what it requires: sustained dialogue, metaphor (”it’s like kneading dough”), analogy (”the pricing logic works like a negotiation, not like a formula”), and the willingness to stay in the conversation long enough for the tacit to surface.

Combination (explicit to explicit): organising and integrating existing explicit knowledge. Merging documents, synthesising reports, restructuring databases. This is the easiest mode to automate and the mode AI performs best. An LLM that summarises a set of policy documents, cross-references specifications, or generates a consolidated report is performing combination. The danger, which Nonaka identified decades before AI made it acute, is mistaking exceptional combination for genuine knowledge creation. AI can recombine explicit knowledge at unprecedented speed. It cannot perform externalisation, because it has no access to the tacit knowledge that externalisation converts.

Internalisation (explicit to tacit): learning by doing. Converting explicit knowledge into embodied practice. Reading the specification and then building until the understanding becomes automatic. This completes the cycle: internalised knowledge becomes the new tacit knowledge that can be shared through socialisation, restarting the spiral. Dweck’s growth mindset provides the psychological precondition: internalisation requires treating explicit knowledge as something to be embodied through practice, not merely memorised.

3. Why Externalisation Is the Bottleneck

The four modes are not equally important. Externalisation, the conversion from tacit to explicit, is the bottleneck of the entire knowledge creation process, and it is the bottleneck of specification-driven development. Everything else depends on it. Socialisation transfers tacit knowledge without making it explicit. Combination reorganises what is already explicit. Internalisation embeds the explicit back into practice. Only externalisation creates the new explicit knowledge that the organisation, and the AI, can work with.

Evans’s knowledge crunching is externalisation in action. The iterative dialogue between developers and domain experts, in which the domain model is constructed through conversation, challenge, and revision, is precisely the process Nonaka describes. The developer asks “how does the pricing work?” The domain expert says “it’s complicated.” The developer proposes a model. The expert says “that’s not quite right.” The model is revised. This cycle, repeated dozens of times, gradually externalises the tacit knowledge that the expert could not articulate in a single sitting.

Klein’s pattern recognition adds a layer. The domain expert whose intuition Klein validated is the person with the richest tacit knowledge. They are also the person for whom externalisation is hardest, because the more expert you are, the more your knowledge has been compressed into patterns that fire below conscious articulation. Asking the expert to explain their judgment is asking them to decompress what years of experience have compressed. The process is uncomfortable, slow, and essential.

Kahneman’s noise enters here too. If externalisation is always imperfect, always a lossy conversion from rich tacit knowledge to simplified explicit representation, then different externalisation sessions will produce different explicit knowledge from the same tacit base. Two specification workshops with the same domain expert on different days will produce different specifications, not because the expert’s knowledge has changed but because the externalisation process is inherently noisy. Decision hygiene, structuring the externalisation dialogue with defined dimensions and independent assessment, reduces this noise without eliminating it.

4. Ba: The Conditions Externalisation Requires

Nonaka introduced the concept of ba (a Japanese term meaning place or context) to describe the conditions required for each mode of knowledge conversion. Externalisation requires what he called dialoguing ba: a context of peer-to-peer interaction, mutual trust, and sustained conversation. This is not a meeting room with a facilitator and a timer. It is a relationship between people who trust each other enough to say “I don’t know how to explain this, but let me try.”

The connection to Edmondson’s psychological safety is direct: dialoguing ba requires that the domain expert can say “actually, it doesn’t work the way the policy says” without professional consequence. The connection to Heifetz’s holding environment is equally direct: the leader’s job is to create and protect the space in which externalisation can happen, which means protecting the participants from the political consequences of making the undiscussable explicit.

Beer’s architecture provides the structural dimension. Nonaka’s ba is the social context; Beer’s System 3* (the audit channel) is the architectural mechanism that connects the externalised knowledge to the decision system. Without System 3*, the knowledge externalised in a workshop stays in the workshop. With it, the new explicit knowledge enters the decision premises that shape organisational action. The architecture and the social context are both necessary. Neither is sufficient alone.

5. Where AI Sits in the Spiral

The SECI model clarifies exactly what AI can and cannot do in the knowledge creation process.

AI excels at combination. It can synthesise, reorganise, cross-reference, and recombine explicit knowledge at a speed and scale no human team can match. This is genuinely valuable. But combination is the mode that creates least new knowledge. It reorganises what is already known.

AI cannot perform socialisation. It cannot learn by sitting next to a master practitioner, absorbing the rhythms and intuitions of a craft. It has no body to observe with, no empathy to share experience through, no relationship in which tacit knowledge transfers.

AI cannot perform externalisation. It can assist: it can ask questions, propose models, challenge descriptions, and generate drafts that the domain expert reacts to. But the conversion from tacit to explicit must happen in the human, because the tacit knowledge lives in the human. The AI is a mirror, not a source. Evans’s knowledge crunching, assisted by AI-generated prototype models that the expert can react to (”that’s not quite right; the pricing actually works more like this”), may be the most productive use of AI in the entire specification process. But the creative act remains human.

AI can accelerate internalisation by generating worked examples, simulations, and practice scenarios from explicit knowledge. A developer learning a new domain can use AI to generate cases that test their understanding, turning the specification into practice exercises that build tacit mastery.

The implication for the Deciding phase: organisations that deploy AI primarily for combination (summarising documents, generating reports, cross-referencing data) are using AI where it adds least value to knowledge creation. Organisations that use AI to support externalisation (generating prototype models for domain experts to react to, proposing specification drafts that surface tacit assumptions through the expert’s corrections) are using AI where it adds most value. The difference is whether AI is reorganising what is already known or helping to surface what has never been articulated.

6. Nonaka’s Limits

Nonaka must be read with his limitations visible. The SECI model emerged from Japanese corporate culture, with its emphasis on socialisation, group harmony, and apprenticeship. Its applicability in Western individualist contexts is contested. The tacit-explicit distinction may be a continuum rather than a dichotomy. The empirical evidence, including the bread-maker case, is illustrative rather than rigorous. And critically, the model does not address power, politics, or conflict. Stacey would argue that Nonaka presents knowledge creation as a manageable, facilitatable process when in reality it emerges from messy, political, anxiety-laden interactions. Argyris would add that the defensive routines preventing externalisation are not merely obstacles to be overcome but structural features of the organisation that serve real protective functions.

The practical limitation is that externalisation cannot be scheduled. You cannot put “convert tacit knowledge to explicit” on a project plan and expect it to happen by Thursday. It happens through relationships, through sustained dialogue, through the kind of unstructured conversation that most organisations have systematically eliminated in the name of efficiency. The leader’s task is not to manage the knowledge spiral but to protect the conditions in which it can turn.

(An Organisational Prompt is something you can do now....)

Identify one piece of tacit knowledge your AI needs.

Pick one domain where your organisation is deploying AI. Ask: what does the most experienced practitioner in this domain know that is not written down anywhere? Not the process documentation. Not the policy manual. The thing they know that nobody has ever articulated, the judgment call, the exception that is not in the rules, the pattern they recognise but cannot explain. Now ask: does your AI deployment plan include any mechanism for converting that knowledge into something the AI can use? If the answer is no, your AI is being built on explicit knowledge only, which means it is being built on what the organisation says it does rather than on what it actually does. The conversion from tacit to explicit is the work your project plan has not accounted for, and it is the work that determines whether the AI will produce useful output or confident nonsense.

Further Reading

Ikujiro Nonaka and Hirotaka Takeuchi: The Knowledge-Creating Company - The foundational text. The SECI model, the knowledge spiral, and the argument that Western organisations systematically undervalue tacit knowledge. Read it for the bread-maker case and the conditions for knowledge creation.

Ikujiro Nonaka: The Knowledge-Creating Company - The article that introduced the ideas to a management audience. Shorter and more accessible than the book.

Ikujiro Nonaka and Hirotaka Takeuchi: The New New Product Development Game - The article that inspired Scrum. Overlapping development phases, cross-functional teams, and the rugby metaphor. Essential for anyone who uses agile methods and wants to understand their intellectual origin.

Michael Polanyi: The Tacit Dimension - The philosophical foundation. “We can know more than we can tell.” Short, profound, and the starting point for everything Nonaka built on.

I write about the industry and its approach in general. None of the opinions or examples in my articles necessarily relate to present or past employers. I draw on conversations with many practitioners and all views are my own.

Argyris: The Importance of What You Cannot Say

Justin Arbuckle — Thu, 28 May 2026 07:00:52 GMT

The Learning phase article on Argyris diagnosed why smart people are often the worst at learning. Defensive routines, the gap between espoused theory and theory-in-use, skilled incompetence: the mechanisms by which successful professionals protect themselves from the discomfort of examining their own reasoning. That was a learning problem. This is the deciding problem that lives inside it: you cannot specify what you cannot articulate, and you cannot articulate what the organisation has made undiscussable.

Every specification problem is, at its root, an articulation problem. The domain expert who cannot explain why the system should behave this way rather than that way is not stupid. They know. They have been doing the work for years. But the knowledge lives in what Argyris called the theory-in-use: the actual rules governing behaviour, which operate below conscious articulation and are often directly contradicted by the espoused theory, the rules people claim to follow. The specification demands that tacit knowledge become explicit. The defensive routines ensure it cannot.

1. The Undiscussable: Why Specifications Miss What Matters Most

Argyris’s most devastating observation is that organisations develop elaborate mechanisms for not discussing the things that matter most. A topic becomes undiscussable when raising it would threaten someone’s competence, status, or control. The undiscussability itself then becomes undiscussable: everyone knows the topic cannot be raised, but nobody can say so. The silence is perfectly maintained by people who are not conscious of maintaining it.

In specification work, the undiscussables are the business rules that nobody has ever written down because writing them down would expose contradictions, incompetence, or political arrangements that benefit from ambiguity. The pricing logic that varies by client relationship but is officially uniform. The approval workflow that is formally three steps but informally seven, with the extra four existing to protect specific people’s authority. The risk threshold that the policy document says is one thing but the actual practice says is another, because the policy was written for the regulator and the practice was designed for the commercial reality.

These are precisely the rules the AI needs to know. They are precisely the rules nobody can say.

Evans’s knowledge crunching assumes that developers and domain experts will sit together and, through iterative dialogue, surface the domain model. Argyris explains why this process reliably fails in practice: the domain expert cannot articulate the theory-in-use because it has never been conscious, and the organisation cannot surface it because doing so would make the undiscussable discussable. The developer asks “how does the pricing work?” The domain expert gives the espoused theory. The specification is written against the espoused theory. The AI generates code that implements the espoused theory. The system goes into production and produces wrong answers, because the actual pricing follows the theory-in-use, which nobody articulated because it was never safe to do so.

2. Skilled Incompetence in the Deciding Phase

In the Learning phase, skilled incompetence described the ability of successful professionals to avoid examining their own reasoning. In the Deciding phase, it has a sharper manifestation: the ability of organisations to produce decisions that look rigorous while avoiding the reasoning that would make them genuinely informed.

The AI governance board is the canonical example. The board meets. Papers are circulated. Risks are assessed. Matrices are completed. A decision is recorded. At no point does anyone say: “We do not actually understand what this AI system will do in production, because the specification it was built from does not describe how the business actually works.” That sentence is undiscussable, because it would imply that the specification process the board oversees is not working, which would threaten the authority of the people who designed it, which would make them defensive, which would trigger the very Model I behaviours (unilateral control, suppress negative feelings, maximise winning) that Argyris documented.

The board’s skilled incompetence is not that it makes bad decisions. It is that it makes decisions that are disconnected from the information that matters, while appearing to be thoroughly informed. The paperwork is impeccable. The reasoning is invisible. Kahneman would call this noise masked by process. Beer would call it an accountability sink. Argyris names the mechanism: defensive routines have colonised the decision architecture, ensuring that the information the architecture was designed to process never enters it.

3. The Ladder of Inference: How Specifications Drift from Reality

Argyris and his colleagues developed the ladder of inference to show how people move from observable data to action through a series of increasingly abstract steps, each of which introduces assumptions that are never tested. You observe data. You select data (filtering what you notice). You add meaning (interpreting what you noticed). You make assumptions (based on the meaning you added). You draw conclusions. You adopt beliefs. You take action. Each rung of the ladder takes you further from the observable reality and closer to a self-reinforcing interpretation that feels like fact.

Specification writing climbs the ladder of inference at every step. The domain expert observes a business process. They select the parts they consider important (filtering out the exceptions, the workarounds, the unofficial practices). They add meaning (”this is how we handle onboarding”). They make assumptions (”the AI needs to replicate this process”). They draw conclusions (”the specification should describe these steps”). They adopt beliefs (”this specification accurately represents our business”). They hand the specification to the AI.

The problem is not at any individual rung. The problem is that nobody climbs back down. Nobody tests whether the selected data was the right data. Nobody checks whether the meaning added was accurate. Nobody challenges whether the assumptions hold. Argyris showed that in Model I behaviour, people advocate their position without inviting inquiry. The specification writer who presents their specification as “how the business works” is advocating without inquiry. The reviewer who approves it without asking “what did you leave out and why?” is colluding in the ascent.

Simon’s decision premises reframe this structurally. The ladder of inference describes how premises become progressively more detached from the observable world. By the time the premise reaches the decision (or the specification), it has been filtered through so many layers of interpretation that it may bear little resemblance to the reality it claims to describe. Simon asks how the right premises reach the right people. Argyris asks why the premises that do arrive have been systematically distorted by the defensive needs of the people who produced them.

4. Model II as a Specification Discipline

Argyris’s Model II is usually presented as a personal skill: make your reasoning explicit, invite genuine challenge, combine high advocacy with high inquiry. In the Deciding phase, it becomes a specification discipline.

A Model II specification process looks like this. The specification writer presents the specification and simultaneously presents the reasoning behind it: “I specified the pricing logic this way because I believe the discount structure works like this. Here is the evidence I used. Here is what I am uncertain about. Here are the parts where I had to guess because nobody could give me a clear answer.” They then invite challenge: “Where am I wrong? What have I missed? What do you know that contradicts what I have written?”

This is what Evans’s knowledge crunching requires but does not describe the conditions for. Evans assumes the dialogue will happen. Argyris explains why it will not happen unless the conditions are explicitly created. The domain expert who says “actually, the discount structure does not work that way; it depends on the relationship manager’s judgment, which is never documented” is making an undiscussable discussable. They will only do this if the environment rewards honesty rather than punishing it, which is Edmondson’s psychological safety operating as a precondition for Model II behaviour.

Klein’s pre-mortem is Model II made structural. Instead of asking people to change their defensive routines (which Argyris acknowledged is extraordinarily difficult), the pre-mortem creates a context in which the undiscussable becomes expected. “Imagine this specification has been implemented and the system is producing wrong answers. Why?” The answers will contain the undiscussables: the business rules nobody wrote down, the exceptions nobody mentioned, the political arrangements that the specification politely omitted. The pre-mortem works not because it changes people but because it changes the question.

5. Why the Gap Between Espoused Theory and Theory-in-Use Is the Specification Gap

The deepest connection between Argyris and the Deciding phase is this: the gap between espoused theory and theory-in-use is precisely the gap between the specification and reality.

Every organisation has an espoused theory of how it works: the process documentation, the policy manuals, the architecture diagrams, the operating procedures. And every organisation has a theory-in-use: the actual practices, workarounds, informal agreements, and undocumented decisions that govern what people really do. The two rarely match. The gap between them is not a documentation failure. It is a structural feature of organisations that have optimised for the appearance of order while accommodating the messiness of reality.

AI does not accommodate the gap. AI takes the espoused theory (the specification, the documentation, the formal rules) and implements it literally. The theory-in-use, the part that makes the business actually work, is invisible to the AI because nobody has articulated it. The result is a system that perfectly implements what the organisation says it does and completely fails to do what the organisation actually does.

POSIWID applies at the specification level: the purpose of the specification is what it produces. If the specification produces a system that does not match reality, then the specification’s actual purpose was to document the espoused theory, not to describe the business. And this is almost always its actual purpose, because documenting the espoused theory is safe (it matches the official narrative) while documenting the theory-in-use is dangerous (it exposes the gap).

Drucker’s theory of the business sits one level above this. The theory of the business is the set of assumptions that generates both the espoused theory and the theory-in-use. When the theory of the business is valid, the gap between the two is small and manageable. When the theory is invalid, the gap widens because the espoused theory continues to express the official assumptions while the theory-in-use adapts to a reality the assumptions no longer describe. The specification inherits whichever version it is given. Without Argyris’s diagnostic, nobody can tell which version that is.

6. Argyris’s Limits

Argyris must be read with his limitations visible. His framework was developed in Western, individualistic contexts, and its applicability in collective cultures is contested. The distinction between Model I and Model II can be overly normative: Model II is presented as universally superior, but in some organisational contexts, the defensive routines serve genuine protective functions that Model II behaviour would strip away without providing an alternative.

Stacey poses the deepest challenge. Argyris assumes there exists a position from which reasoning can be examined and improved: you can step outside your defensive routines and observe them. Stacey argues this position does not exist, because the observer is embedded in the same responsive processes as the observed, and the act of examination is itself shaped by the dynamics it claims to examine. The debate is genuine. This series holds both: Argyris provides the diagnostic that Stacey says is impossible but practitioners find indispensable.

The practical limitation is that Model II behaviour is extraordinarily difficult to learn. Argyris himself found that even after years of training, most professionals could articulate Model II principles (which became their new espoused theory) while continuing to operate in Model I (the unchanged theory-in-use). The programmes reproduced the very gap they were designed to close. This is not a reason to abandon Argyris. It is a reason to complement him with structural interventions, like Klein’s pre-mortem and Kahneman’s decision hygiene, that reduce the dependence on individual behavioural change and instead redesign the decision environment.

(An Organisational Prompt is something you can do now....)

Find one undiscussable in one specification.

Take a specification your organisation has recently produced, one that is considered complete and approved. Sit with the domain expert who provided the business rules, in private, without the project manager or the governance people present. Ask: “Is there anything about how this actually works that is not in this document?” Then be quiet. Wait. The silence will be uncomfortable. What follows will be the most valuable information in the entire project, because it will be the information that the specification process was designed, structurally, not to capture. You do not need to fix the process today. You need to see the gap. Once you have seen it in one specification, you will see it in all of them.

Further Reading

Chris Argyris: Teaching Smart People How to Learn - The single most important Argyris article. Why the most successful professionals are the worst at learning, and why leadership development programmes reproduce the gap they are designed to close. Freely accessible.

Chris Argyris: Overcoming Organizational Defenses - The fullest treatment of defensive routines, the ladder of inference, and Model I/Model II applied to organisations. The book to read if you want to understand why your specification process captures the espoused theory and misses the theory-in-use.

Chris Argyris and Donald Schön: Organizational Learning II: Theory, Method, and Practice - The collaborative framework that extends single-loop and double-loop learning to the organisational level. Deutero-learning, the capacity to learn how to learn, is the concept this series keeps returning to.

Chris Argyris: Knowledge for Action - The practitioner-oriented treatment. Case studies of organisations attempting Model II and the specific ways they fail. Read it for the honest assessment of how hard this is.

Klein: Trust Your Gut (Sometimes)

Justin Arbuckle — Mon, 25 May 2026 07:00:57 GMT

The previous article argued that your organisation’s decisions scatter more than anyone believes, and that noise, the random variability in professional judgment, is at least as damaging as bias. The natural response is to structure everything: rubrics, algorithms, checklists, mechanical aggregation. Remove the human. Remove the variability. This is half right and half catastrophic. Gary Klein spent thirty years studying people who make life-or-death decisions under time pressure, people whose intuition works, and his research shows that the impulse to replace expert judgment with process will destroy exactly the capability your organisation needs most.

Klein is a cognitive psychologist who founded the Naturalistic Decision Making movement. Where Kahneman studied decision-making in the laboratory, Klein studied it in burning buildings, intensive care units, military command posts, and offshore oil platforms. Where Kahneman found systematic error, Klein found systematic competence. The same cognitive mechanism, System 1 pattern recognition, produces both. The difference is not in the person. It is in the environment. This distinction, which Klein and Kahneman eventually agreed on after years of adversarial collaboration, is the most useful framework in the decision science literature for anyone trying to figure out which of their organisation’s experts to trust and which to overrule.

1. Recognition-Primed Decision: How Experts Actually Decide

Klein’s central finding is that experts do not decide the way decision theory says they should. They do not generate multiple options, weigh them against criteria, and select the best. They recognise the situation, generate a single course of action based on pattern recognition, mentally simulate it to check whether it will work, and act. If the simulation reveals a problem, they modify the action or generate the next most plausible option. The process is serial (one option at a time), not parallel (comparing multiple options simultaneously).

Klein calls this the Recognition-Primed Decision model. He discovered it by studying fireground commanders: people who make decisions about where to send crews into burning buildings, with lives at stake, under extreme time pressure, with incomplete and changing information. These commanders almost never compared options. They looked at the fire, recognised a pattern from their experience, knew what to do, and did it. When Klein asked them to explain their decisions, they often could not articulate the reasoning. They said things like “it just felt right” or “I could see it was going to go bad.” This is not mysticism. It is pattern recognition operating below conscious articulation but above random guessing.

The model has three levels. At Level 1, the situation is immediately recognised and the action is obvious: the experienced firefighter sees a backdraft pattern and orders evacuation without deliberation. At Level 2, the situation requires diagnosis: the pattern is not immediately clear, so the expert runs mental simulations until one fits. At Level 3, the situation is complex enough that the expert must evaluate a course of action by imagining its consequences, modify it if the simulation reveals problems, and iterate. Even at Level 3, the process is not comparison. It is generation, simulation, and modification of a single line of action.

For the series, this matters because it describes how the best people in your organisation actually work. The senior architect who looks at a system design and says “that won’t scale” is not guessing. They are recognising a pattern from hundreds of systems they have seen before. The domain expert who reads a specification and says “that’s not how we do it” is not being obstructive. They are matching the specification against a library of domain situations built over years. The experienced leader who walks into a struggling team and senses something is wrong before anyone has said a word is reading cues that their pattern library can decode and their conscious mind cannot yet articulate.

2. The Pattern Library: What Expertise Actually Is

Klein’s research redefines expertise. It is not superior analytical ability. It is a richer, more accurate library of situation-action patterns built through experience. Simon estimated that expertise requires roughly 50,000 chunks of domain knowledge, accumulated over approximately ten years of deliberate practice. Klein’s fieldwork confirms this: the expert’s advantage is not that they think harder but that they see more. They perceive cues that novices miss. They recognise patterns that novices have never encountered. They generate expectations about what will happen next, and when those expectations are violated, they know something has changed.

Four elements activate simultaneously when an expert recognises a situation: cues (what they notice in the environment), expectancies (what they predict will happen next), goals (what they are trying to achieve), and actions (what to do about it). These do not fire sequentially. They fire as a package. The firefighter does not first perceive the cue, then predict the trajectory, then identify the goal, then select the action. They perceive the situation and know what to do, in a single cognitive act. This is what “intuition” means when it works: not a feeling disconnected from evidence, but compressed expertise recognising a familiar pattern and activating the appropriate response.

The implication for organisations is that expert judgment is not a soft skill to be tolerated. It is an asset to be cultivated, protected, and deployed strategically. The organisation that replaces expert judgment with checklists in domains where expertise is valid has destroyed its most valuable decision-making resource. The organisation that defers to expert judgment in domains where expertise is invalid has handed its future to confident pattern-matchers operating in an environment that does not reward pattern-matching.

The question, as always, is which domains are which.

3. When Intuition Works: High-Validity Environments

The Kahneman-Klein adversarial collaboration, published in 2009 after years of argument, produced a resolution that is more useful than either position alone. They agreed: intuition is trustworthy when two conditions are met.

First, the environment must be sufficiently regular that patterns exist to be learned. Chess is regular: the same positions recur and the rules do not change. Firefighting is regular: fire behaviour follows physical laws, and while each fire is different, the patterns are learnable. These are high-validity environments. There is a stable, underlying structure that rewards pattern recognition.

Second, the decision-maker must have had prolonged practice with valid feedback. The feedback must be prompt (you learn quickly whether your decision was right), clear (the outcome is unambiguous), and connected to the decision (you can attribute the outcome to your choice, not to luck or other factors). A chess player gets immediate, unambiguous feedback after every move. A surgeon gets feedback within hours: the patient recovers or does not. A firefighter gets feedback within minutes: the building behaves as predicted or it does not.

When both conditions are met, intuition is not just acceptable. It is superior to analytical methods. The expert operating in a high-validity environment with years of valid feedback will consistently outperform the checklist, the algorithm, and the committee. This is Klein’s core finding, and it has been replicated across domains from military command to intensive care nursing to chess.

Evans’s knowledge crunching produces high-validity environments by design. When developers and domain experts work together iteratively, testing the model against reality and refining it through feedback, they are building the conditions Klein describes: a domain with learnable regularities and prompt, clear feedback. The domain expert who has been through months of knowledge crunching has valid intuition about the domain model. Their judgment about what the specification should say is trustworthy, because it has been calibrated by the exact process Klein’s research describes.

4. When Intuition Fails: Low-Validity Environments

Kahneman’s contribution to the collaboration was equally important. He insisted, and Klein agreed, that many professional environments do not meet the two conditions. The environment is irregular, the feedback is delayed, or the feedback is ambiguous. In these environments, expert intuition is unreliable regardless of the expert’s experience or confidence.

Stock picking is a low-validity environment: the market is too complex and too influenced by other actors for patterns to be reliably learnable. Political prediction is a low-validity environment: the feedback is delayed by years and confounded by countless variables. Long-range strategic forecasting is a low-validity environment: the outcome depends on factors the forecaster cannot observe or control.

AI strategy is a low-validity environment. The technology changes faster than any executive can accumulate valid experience. The feedback is delayed by months or years. The feedback is ambiguous: when an AI initiative fails, it is never clear whether the failure was caused by the strategy, the implementation, the technology, the culture, or the timing. The executive who says “I have a gut feeling about where AI is heading” is exhibiting exactly the confident pattern-matching that Kahneman’s research shows is unreliable in environments this novel.

This does not mean the executive’s judgment is worthless. It means their judgment about AI strategy should be treated differently from the domain expert’s judgment about specification quality. The first operates in a low-validity environment and should be structured, tested, and challenged. The second operates in a high-validity environment and should be trusted, protected, and amplified. The organisation needs both. The decision architecture must distinguish between them.

Beer’s System 3* (the audit channel) is the architectural mechanism for making this distinction. The audit channel provides direct, unfiltered access to what is actually happening. In Klein’s terms, it tests whether the environment is providing valid feedback. If the audit reveals that the domain expert’s intuitions are consistently confirmed by the AI-generated output, you are in a high-validity environment and the expert’s judgment should be trusted. If the audit reveals that the strategic forecast is consistently wrong, you are in a low-validity environment and the judgment should be structured.

5. The Pre-Mortem: Klein’s Most Practical Tool

Klein’s most widely adopted contribution is the pre-mortem. The method is simple: before a decision is implemented, the team imagines it has already been implemented and has failed. Each member independently writes down the reasons for the failure. The results are collected and discussed.

The pre-mortem works because it inverts the cognitive dynamics that Kahneman identified. WYSIATI (What You See Is All There Is) suppresses awareness of what could go wrong, because the plan is coherent and the team is committed. The pre-mortem gives explicit permission to name what could go wrong, bypassing the social pressure to agree. Overconfidence is reduced because the team has been asked to generate failure narratives, not success narratives. And because the exercise is individual before it is collective, it captures the disagreement that Kahneman’s decision hygiene requires: the independent judgment that group dynamics would otherwise suppress.

For AI transformation, the pre-mortem has a specific application. Before deploying an AI-assisted workflow, before rolling out a specification-driven development process, before restructuring teams around domain boundaries, imagine it has failed. What went wrong? The answers will surface the assumptions the plan relies on but has never tested. They will name the dependencies the plan assumes but has never verified. And they will reveal the political objections that will emerge once the plan threatens the people whose roles depend on the current architecture.

The pre-mortem is Argyris made structural. Argyris showed that defensive routines suppress the information the organisation needs. The pre-mortem creates a structured context in which the undiscussable becomes not just discussable but expected. It is not a cultural intervention. It is an architectural one. And it works in organisations that would resist Argyris’s deeper prescription, because it does not require anyone to change their defensive routines. It requires only that they answer a question.

6. Klein’s Limits

Klein must be read with his limitations visible. His model is descriptive, not normative: it describes how experts do decide, not how they should. The expert whose pattern library contains bad patterns will execute those patterns with the same speed and confidence as the expert whose library is good. RPD does not distinguish between valid and invalid expertise. The Kahneman-Klein resolution does, but only by stepping outside Klein’s framework and asking about the environment.

Klein’s model also struggles with genuinely novel situations. If no pattern exists in the expert’s library, RPD has nothing to work with. The expert in an entirely new domain, the experienced insurance underwriter encountering AI-generated risk models for the first time, has no relevant patterns. Their intuition will default to the closest available analogue, which may be dangerously wrong. Taleb’s Black Swan territory is precisely where Klein’s model has least to offer and where structured, humble, experimental approaches have most value.

The deepest tension is with Simon. Simon says: design the decision environment so the right premises reach the right people. Klein says: trust the expert who has been calibrated by the right environment. These are not contradictory. They are complementary, and the organisation needs both. Design the environment (Simon) to produce experts with valid pattern libraries (Klein), and then trust those experts to decide. The architecture creates the conditions for expertise. The expertise produces the decisions. Neither works without the other.

(An Organisational Prompt is something you can do now....)

Run a pre-mortem on your next AI decision.

Before you approve the next initiative, the next team restructuring, gather the people who will be affected. Tell them: “Imagine it is six months from now and this has failed completely. What went wrong?” Give them five minutes to write independently. Then read the answers aloud. The things they write will be the things they already know but have not been able to say. The pre-mortem does not require courage. It requires only a question and five minutes of silence. The information that emerges will be more valuable than the analysis that preceded it, because the analysis was constructed by people who wanted the plan to succeed, and the pre-mortem was constructed by people who were given permission to imagine it failing.

Further Reading

Gary Klein: Sources of Power: How People Make Decisions - The foundational text on naturalistic decision-making. The RPD model, the fireground studies, and the argument that expertise is pattern recognition, not analysis. The most important book on how experts actually decide.

Gary Klein: Seeing What Others Don’t: The Remarkable Ways We Gain Insights - How breakthroughs happen: by challenging assumptions, making connections, and noticing contradictions. The insight research that complements the RPD model.

Gary Klein: Streetlights and Shadows: Searching for the Keys to Adaptive Decision Making - Ten claims about how we should make decisions, and the research that challenges each one. The best Klein book for a reader sceptical of the “trust intuition” message.

Daniel Kahneman and Gary Klein: Conditions for Intuitive Expertise: A Failure to Disagree - The adversarial collaboration. When intuition works and when it does not. The single most useful paper in the decision science literature for practitioners.

Kahneman: The Scatter of Noise in Your Decisions

Justin Arbuckle — Thu, 21 May 2026 07:00:55 GMT

The Learning phase article on Kahneman introduced System 1 and System 2 as the cognitive architecture beneath organisational resistance to change. Bias was the headline: the systematic errors that make leaders overconfident, anchored to first impressions, and blind to what they do not know. That was a learning problem. This is the deciding problem that Kahneman spent his final years working on, and it is worse than bias: your organisation’s decisions are not just systematically wrong. They are randomly wrong. Different people, facing the same facts, on different days, reach different conclusions. And nobody notices.

Kahneman, with Olivier Sibony and Cass Sunstein, called this noise.

1. Noise Is Not Bias

Bias is a systematic deviation from the correct answer. If every underwriter in your insurance business overestimates risk for applicants from certain postcodes, that is bias: the error points in the same direction every time. Bias is visible in aggregate. You can measure it, name it, and design interventions to correct it.

Noise is the variability in judgments that should be identical. Two underwriters, given the same file, on the same day, produce different risk assessments. The same underwriter, given the same file on a different day, produces a different risk assessment. The error does not point in a consistent direction. It scatters. And because it scatters, it is invisible in averages. The organisation’s mean judgment may look reasonable while the individual judgments that compose it are wildly inconsistent.

Kahneman’s central claim in Noise (2021) is blunt: in most professional domains, noise is at least as large as bias, and usually larger. Studies of criminal sentencing, insurance underwriting, medical diagnosis, patent evaluation, and personnel assessment all show the same pattern. The variability between professionals making the same judgment is enormous, far exceeding what anyone involved believes. A noise audit, in which the same cases are independently assessed by multiple professionals, reliably shocks the organisation that conducts it. The professionals are shocked because they assumed their colleagues would agree. The leaders are shocked because they assumed the process guaranteed consistency. Both assumptions are wrong.

2. Why Noise Matters More Than Bias for the Deciding Phase

Bias has dominated the decision-quality conversation since Kahneman and Tversky’s original heuristics-and-biases programme in the 1970s. The organisational response has been debiasing: awareness training, structured decision processes, devil’s advocates, pre-mortems. These interventions address systematic error. They do nothing for noise.

The Deciding phase hypothesis is that decisions are design challenges. Noise reframes the design constraint. If your organisation’s decisions are noisy, then the quality of any individual decision is partly a function of who made it and when, not just what information was available or what process was followed. Two teams writing AI specifications for the same domain will produce different specifications, not because they have different information but because professional judgment varies. Two architecture review boards, assessing the same proposal on different days, will reach different conclusions. The variability is not a failure of the individuals. It is a structural feature of human judgment that the decision architecture has not been designed to manage.

Simon’s bounded rationality tells you the decision-maker cannot process everything. Beer’s requisite variety tells you the architecture must deliver information worth processing. Kahneman’s noise tells you that even when the information is right and the process is sound, the judgment applied to that information will vary in ways nobody has measured. This is a third constraint on decision quality, orthogonal to the other two, and most organisations have never even looked for it.

3. The Noise Audit: Measuring What You Have Never Measured

The most practical tool in Noise is the noise audit. The method is simple: present the same case to multiple professionals independently, and measure the variability in their judgments. The insurance company that asks its underwriters to price the same risk. The hiring committee that scores the same candidate independently before discussing. The specification review that has two teams assess the same AI-generated output against the same criteria.

Kahneman reports that when organisations first conduct a noise audit, the results are consistently disturbing. In one study, underwriters at a large insurance company estimated premiums for the same risks. The median difference between underwriters was 55%, more than five times what the company’s executives had predicted. The executives expected 10% variation. The reality was that two underwriters, trained in the same methods, working for the same company, applying the same guidelines, produced judgments that differed by more than half.

For AI transformation, the noise audit has a specific and urgent application. If your organisation relies on human judgment to evaluate AI-generated outputs, to assess specification quality, to approve AI-assisted decisions, then the consistency of that judgment is the ceiling on the quality of your AI process. An AI model that generates consistently good specifications is only as valuable as the human review process that evaluates them. If the review is noisy, the organisation cannot distinguish good AI output from bad, because the evaluation itself varies more than the thing being evaluated.

4. Decision Hygiene: The Structural Response

Kahneman’s response to noise is not cognitive. It is architectural. He calls it decision hygiene: a set of structural interventions that reduce noise without requiring anyone to become a better thinker.

The principles are straightforward. Structure the judgment: replace open-ended assessments with defined dimensions scored independently. Sequence the information: present facts before opinions, evidence before conclusions. Use independent assessment before discussion: have each decision-maker form a judgment before the group convenes. Aggregate judgments mechanically: average the independent assessments rather than letting the loudest voice prevail.

Each of these is a variety management intervention in Beer’s terms. Structured dimensions attenuate the variety of possible judgments to a manageable set. Independent assessment amplifies the variety of perspectives before the group attenuates them through discussion. Mechanical aggregation prevents the social dynamics of the meeting room from destroying the information contained in disagreement.

The connection to Evans is direct. Evans’s ubiquitous language is a noise-reduction mechanism. When two teams use the same term to mean different things, the variability in their specifications is partly linguistic noise: the judgment differs because the description differs, not because the domain differs. Establishing a shared vocabulary within a bounded context does not just improve precision. It reduces the noise that imprecision introduces into every downstream decision.

The connection to Argyris is equally direct. Argyris showed that defensive routines suppress disagreement. Kahneman shows that premature convergence, the group rushing to consensus before individual judgments have been formed, destroys the information contained in the disagreement. The remedy is the same: create conditions in which independent judgment can be expressed before social pressure compresses it. But where Argyris frames this as a psychological challenge (overcoming defensiveness), Kahneman frames it as an architectural one (sequencing the process so that convergence happens after, not before, independent assessment). Both are right. The architecture enables what the psychology permits.

5. When to Trust Intuition: The Kahneman-Klein Resolution

The Learning phase article presented Kahneman’s work as a catalogue of cognitive failures. The Deciding phase requires a more nuanced position, because the organisation that distrusts all intuition will be paralysed, and the organisation that trusts all intuition will be deluded. The question is not whether to trust judgment but when.

Kahneman and Gary Klein, whose work on expert intuition reaches the opposite conclusion from the heuristics-and-biases programme, spent years in adversarial collaboration trying to reconcile their positions. The result, published in 2009, is the most useful framework in the decision science literature. They agreed: intuition is trustworthy when two conditions are met. First, the environment must be sufficiently regular that patterns exist to be learned. Second, the decision-maker must have had prolonged practice with valid feedback, meaning feedback that is prompt, clear, and connected to the decision.

A chess master’s intuition is trustworthy because chess is regular and feedback is immediate. A firefighter’s intuition is trustworthy because fire behaviour, while dangerous, follows patterns, and feedback is visceral and fast. An executive’s intuition about AI strategy is not trustworthy, because the environment is novel (no regularities to learn from), the feedback is delayed (outcomes take months to materialise), and the feedback is ambiguous (it is never clear whether the outcome was caused by the decision or by other factors).

This maps onto the Deciding phase architecture. Domains where Evans’s knowledge crunching has produced a mature, validated model are high-validity environments: the domain expert’s intuition about what the specification should say is trustworthy, because they have years of patterned experience with clear feedback. Domains where the model is new, untested, or contested are low-validity environments: judgment should be structured, aggregated, and tested rather than trusted. Beer’s System 3* (the audit channel) is the architectural mechanism for checking whether the domain has enough regularity to justify intuitive judgment. Without the audit, the organisation cannot know whether it is in a high-validity environment or merely believes it is.

For AI, this distinction is critical. The experienced domain expert who says “that AI-generated specification is wrong” may be exercising valid pattern recognition developed over years. The executive who says “AI will transform our business within two years” is almost certainly exercising System 1 pattern-matching in an environment too novel to support it. The organisation needs both kinds of judgment. The decision architecture must distinguish between them.

6. Kahneman’s Limits

Kahneman must be read with the replication crisis visible. Several findings from the original heuristics-and-biases programme, particularly priming effects and ego depletion, have not replicated. Kahneman himself acknowledged this with unusual candour for a Nobel laureate. The core findings on noise, prospect theory, and the conditions for expert intuition remain robust, but the broader programme is less secure than Thinking, Fast and Slow suggests.

The deeper limitation is that Kahneman’s framework is individual. It explains how single decision-makers err. It says less about how organisations amplify or dampen those errors through their structures, cultures, and power dynamics. Beer provides the structural account. Bourdieu explains why the noise audit will be resisted: the variability in professional judgment is not a bug the organisation wants to fix but a feature that protects individual autonomy and status. Standardisation reduces noise, but it also reduces the professional’s sense that their judgment matters. The tension between decision quality and professional identity is real, and Kahneman’s architectural solutions must navigate it.

(An Organisational Prompt is something you can do now....)

Run a noise audit on one decision.

Pick a decision your organisation makes repeatedly using professional judgment. Perhaps it is the assessment of AI-generated code quality. Perhaps it is the prioritisation of features in a product backlog. Perhaps it is the evaluation of vendor proposals. Take 1 recent case and have it independently reassessed by a different group of equally qualified professionals, without access to the original judgments. Compare the two sets of assessments. The gap between them is the noise in your decision process. You have never measured it. It is almost certainly larger than you expect. And until you measure it, every intervention you design to improve decision quality is optimising a signal you cannot distinguish from the noise surrounding it.

Further Reading

Daniel Kahneman, Olivier Sibony, and Cass Sunstein: Noise: A Flaw in Human Judgment - Kahneman’s final major work. The distinction between bias and noise, the noise audit, and the case for decision hygiene. More immediately actionable than Thinking, Fast and Slow.

Daniel Kahneman: Thinking, Fast and Slow - The popular synthesis of the heuristics-and-biases programme. System 1 and System 2, prospect theory, and WYSIATI. Read it for the cognitive foundations; read Noise for the organisational implications.

Daniel Kahneman and Gary Klein: Conditions for Intuitive Expertise: A Failure to Disagree - The adversarial collaboration that reconciled heuristics-and-biases with naturalistic decision-making. The two conditions for trustworthy intuition: environmental regularity and prolonged practice with valid feedback. The single most useful paper for anyone who needs to know when to trust expert judgment and when to structure the decision instead.

Olivier Sibony: You’re About to Make a Terrible Mistake! - Sibony’s practitioner-oriented treatment of decision quality in organisations. More accessible than Kahneman, with worked examples from business strategy.

Simon: The Decision Architecture of Good Enough

Justin Arbuckle — Mon, 18 May 2026 07:00:55 GMT

The Deciding phase of this series rests on three levers. Beer governs Interaction: the structural architecture through which decisions flow. Ohno governs Information: the precision and pathology of domain description. The third lever is Identity: what is available to the decision-maker before the decision begins. Not what they choose, but what they can see, what they consider, what they take for granted, and what they never think to question. In the Learning phase, Bourdieu governed this lever through habitus: the embodied dispositions that generate practice below conscious awareness. In the Deciding phase, the governor is Herbert Simon.

Simon’s argument is deceptively simple: human beings cannot be rational in the way that classical economics assumes. They do not have complete information. They cannot evaluate all alternatives. They cannot compute optimal solutions to complex problems. This is not a character flaw. It is a structural feature of human cognition confronting a world more complex than any mind can process. Simon called it bounded rationality, and it is the single most important concept in organisational decision theory, because everything else follows from it: how organisations should be structured, how information should flow, how decisions should be distributed, and why most organisations get all of these wrong.

Simon was an extraordinary polymath. He won the Nobel Prize in Economics in 1978 for his work on decision-making in organisations and the Turing Award in 1975 (with Allen Newell) for contributions to artificial intelligence. He spent most of his career at Carnegie Mellon, where he helped found one of the world’s first computer science departments. His key works span from Administrative Behavior (1947) through Organizations (with James March, 1958), The Sciences of the Artificial (1969), and the landmark essay The Architecture of Complexity (1962). He was simultaneously a political scientist, an economist, a cognitive psychologist, and a computer scientist. He could hold all of these in his head at once, which is precisely the kind of cognitive feat his own theory says most people cannot manage.

1. Bounded Rationality: The Constraint That Shapes Everything

Classical economics assumes “economic man”: a decision-maker with complete information, unlimited computational capacity, and the ability to select the option that maximises utility. Simon demonstrated that this is a fiction. Real decision-makers face three binding constraints: limited information (they rarely know all the alternatives or their consequences), cognitive limits (the human mind cannot process the information it does have), and time pressure (most decisions must be made before exhaustive analysis is possible).

Simon replaced economic man with administrative man: a decision-maker who operates within these bounds and does the best they can given what they have. This is not irrationality. It is rationality operating under realistic constraints. The interesting question, Simon argued, is not “did you choose optimally?” but “did you decide well given what you could know?”

The implication for the Deciding phase is foundational. If bounded rationality is real, and the evidence is overwhelming, then the quality of organisational decisions depends more on the design of decision processes and information flows than on the intelligence of individual decision-makers. You can hire brilliant people and they will still make poor decisions if the architecture feeds them the wrong information, at the wrong time, in the wrong format, under the wrong constraints. The design of the decision environment is the lever. The individual is the variable the design must accommodate.

This is the parallel to Bourdieu that gives Simon the Identity lever. Bourdieu’s habitus constrains what the learner can perceive: the embodied dispositions acquired through socialisation filter what is thinkable before conscious thought begins. Simon’s bounded rationality constrains what the decision-maker can process: the cognitive architecture filters what is decidable before deliberation begins. Both govern through limitation on what is available to the subject. Bourdieu’s limitation is sociological. Simon’s is cognitive. Together they explain why organisations reproduce their existing patterns: the people inside them literally cannot see, think, or decide their way to alternatives that fall outside the bounds their identity imposes.

2. Satisficing: The Rationality That Actually Works

Simon coined the term satisficing (from satisfy and suffice) to describe how people actually choose. Rather than evaluating all alternatives to find the optimum, the decision-maker sets an aspiration level, a threshold of what counts as good enough, and chooses the first option that meets it. If nothing meets it, the aspiration is lowered. If options come easily, the aspiration is raised.

This sounds like settling. It is not. It is the only rational strategy when the cost of searching for the optimum exceeds the benefit of finding it, when the optimum may be unknowable even in principle, or when the scarce resource of attention must be conserved for decisions where it matters most. Satisficing is not the failure to optimise. It is the recognition that optimisation is itself a choice about where to spend cognitive resources, and spending them on search when the first adequate option is available is a waste.

Organisations satisfice systematically, whether they admit it or not. Standard operating procedures are satisficing strategies: they prescribe a response that is good enough for routine situations, conserving attention for exceptions. Departmentalisation is a satisficing strategy: break the problem into pieces small enough for bounded minds to handle. Hierarchy is a satisficing strategy: allocate different types of decisions to different levels so that no single level is overwhelmed.

For AI transformation, satisficing reframes the entire conversation. The organisation that insists on finding the “optimal” AI strategy before acting is not being rigorous. It is violating Simon’s insight: the optimal strategy is unknowable in a domain this complex, and the cost of searching for it (in time, in paralysis, in opportunity foregone) exceeds the benefit. Rumelt’s proximate objectives are the strategic application of satisficing: set a target close enough to be feasible, act, learn, adjust the aspiration, and act again. The organisation that waits for the optimal specification template, the optimal governance framework, the optimal AI platform, will still be waiting when its competitors have satisficed their way through three iterations.

3. Decision Premises: How Organisations Actually Shape Decisions

Simon’s most radical contribution to organisational theory is the concept of decision premises. He reconceptualised organisations not as authority structures that control what people do, but as information systems that shape the premises entering individual decisions.

A decision premise is any input that influences a decision: a fact, a value, a goal, a constraint, an assumption. Authority and influence operate not by controlling decisions directly but by controlling the premises that enter them. When the organisation sets goals, it supplies value premises. When it provides data, it supplies factual premises. When it establishes procedures, it determines which premises are considered and which are excluded. When it defines roles, it determines whose premises count.

This reframing is transformative for the Deciding phase. The question is not “who should make this decision?” but “what premises should enter this decision, and how does the organisation ensure they get there?” A badly designed organisation does not produce bad decisions because its people are stupid. It produces bad decisions because the wrong premises reach the right people, or the right premises reach the wrong people, or the right premises reach the right people too late for them to matter.

The connection to Evans is structural. Evans’s ubiquitous language is a mechanism for aligning decision premises across a team. When the payments team and the fraud team use the word “customer” to mean different things, the premises entering their decisions are different even when the facts are the same. The linguistic divergence is not a communication problem. It is a decision premise problem: the two teams are deciding from different starting points, and their decisions will diverge accordingly. Evans’s knowledge crunching is the process of negotiating shared premises between domain experts and developers. Beer’s System 2 (coordination) is the architectural mechanism that ensures premises are shared across autonomous units without crushing their autonomy.

The connection to Drucker is equally direct. Drucker’s theory of the business is the set of assumptions (premises) about environment, mission, and competencies that pre-decides most questions before they are asked. When the theory expires, the premises are wrong, and every decision that flows from them is wrong. Drucker asks “are our assumptions still valid?” Simon asks the design question: “how do we ensure that valid premises reach the people who need them?”

4. Attention as the Scarce Resource

In 1971, Simon identified what has become one of the defining insights of the information age: “In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention.”

This was written before the internet. It was written before email. It now reads as prophecy. The problem in organisations is not information scarcity but attention scarcity. Most information system designers get this backwards: they build systems that produce more information when what is needed is systems that filter information and allocate attention.

For AI, this insight is devastating. AI amplifies information production by orders of magnitude. A single prompt can generate analyses, code, specifications, and reports that would have taken teams weeks. The organisation that deploys AI without redesigning its attention architecture will drown. More AI-generated output is not more value. It is more demand on the one resource that cannot be scaled: human attention.

Beer’s variety management is the structural expression of Simon’s attention insight. Attenuation (filtering incoming variety) is attention management: deciding what not to look at. Amplification (broadening the response repertoire) is attention allocation: ensuring that what does reach the decision-maker is worth their cognitive resources. Beer provides the architecture. Simon explains why the architecture is necessary: because the human mind is the bottleneck, and the bottleneck cannot be widened, only managed.

The 3-4 homeostat that Beer describes, the balance between inside-and-now (System 3) and outside-and-then (System 4), is an attention allocation mechanism in Simon’s terms. The organisation has a finite attention budget. How much goes to optimising current operations and how much goes to sensing the environment is a zero-sum allocation. The committee that meets weekly to review AI adoption metrics is consuming attention that could be spent on sensing how AI is changing the competitive landscape. Both matter. The budget is finite. Simon tells you the budget exists. Beer tells you how to allocate it.

5. The Architecture of Complexity: Why Your Organisation Must Be Decomposable

Simon’s 1962 essay “The Architecture of Complexity” provides the theoretical foundation for everything from microservices to bounded contexts to team topologies. His argument: complex systems that evolve and persist tend to be hierarchically organised, where hierarchy means subsystems containing subsystems, not necessarily authority relationships. He called this near-decomposability: interactions within subsystems are stronger than interactions between them.

The Hora and Tempus parable makes the point vivid. Two watchmakers build watches of a thousand parts. Tempus assembles his sequentially; any interruption means starting over. Hora designs his from stable subassemblies of ten parts each; an interruption loses only the current subassembly. Hora prospers. Tempus goes bankrupt. The lesson: systems composed of stable intermediate forms evolve far more rapidly than systems that must be assembled all at once.

This is the theoretical statement that Beer operationalised as the VSM’s recursive structure and that Evans operationalised as bounded contexts. Simon’s nearly decomposable systems are Beer’s System 1 units (semi-autonomous subsystems with strong internal cohesion and weaker external coupling) and Evans’s bounded contexts (domains with their own model, language, and team). The three governors converge: Simon provides the theory (complex systems must be decomposable to be manageable), Beer provides the architecture (each subsystem must be a viable system with its own five functions), and Evans provides the domain design (each context must have its own model and ubiquitous language).

For AI transformation, near-decomposability is the architectural argument against the monolithic AI strategy. The organisation that attempts to deploy AI as a single, enterprise-wide programme is Tempus: any disruption (a change in technology, a failed pilot, a leadership transition) means starting over. The organisation that designs AI adoption as a set of semi-autonomous experiments within bounded contexts is Hora: each experiment is a stable intermediate form that can succeed or fail without destroying the others. Conway’s Law, which we will address later in the series, provides the structural mechanism: the system mirrors the communication structure, which means the decision about how to decompose the organisation is simultaneously a decision about the architecture of what it builds.

6. Design as the Core Activity

Simon’s broadest claim is also his most relevant for this series. “Everyone designs who devises courses of action aimed at changing existing situations into preferred ones.” This definition unifies engineering, management, medicine, and education as design disciplines. It is also the philosophical foundation of the Deciding phase hypothesis: decisions are design challenges, and design is a sequence of decisions under constraint.

An artificial (designed) system is an interface between an inner environment (the substance and organisation of the artefact) and an outer environment (the surroundings in which it operates). If the inner environment is appropriate to the outer, the artefact serves its purpose. The design challenge is matching the inner system to the demands of the outer environment, and bounded rationality is the constraint that makes this matching imperfect, iterative, and never complete.

This is where Simon connects to Ackoff and to the deciding hypothesis most directly. Ackoff’s dissolving (redesigning the system so the problem disappears) is Simon’s design made radical: not just matching inner to outer environment but transforming the inner environment so fundamentally that the mismatch ceases to exist. Boyd’s OODA loop is Simon’s design process made temporal: the continuous cycle of observing the outer environment, orienting the inner model, deciding on the match, and acting to improve it. Rumelt’s kernel (diagnosis, guiding policy, coherent action) is Simon’s design process made strategic: diagnose the mismatch between inner and outer environment, establish a guiding policy for closing it, and execute coherent actions.

The AI implication is that AI does not replace the designer. It amplifies the design cycle. The specification writer who uses AI to generate implementations is designing faster, but they are still designing: still matching their understanding of the domain (inner environment) to the demands of the business (outer environment) under the constraints of bounded rationality. The constraint has not changed. The speed has. And speed without design quality, as Simon would insist, produces more artefacts that fail to match their environment, not fewer.

7. Simon’s Limits

Simon must be read with his limitations visible. His framework underplays power and politics: decision premises are not distributed neutrally, and the question “whose premises enter the decision?” is a political question that Simon’s design-oriented framework does not adequately address. Bourdieu explains why: the premises that enter decisions are shaped by the distribution of capital within the field, and those with the most capital shape the premises that favour their position. Giddens adds the structural dimension: the premises are embedded in the rules and resources that are reproduced through daily practice, and changing the premises means changing the structure, which the structure resists.

Simon also underestimates emotion and identity. His framework acknowledges but does not deeply explore how identity concerns shape what people consider decidable. Heifetz names this gap: adaptive challenges are situations where the decision-maker must revise their own identity before they can see the alternatives that bounded rationality has filtered out. The bounds are not only cognitive. They are existential. The leader who cannot imagine their organisation without the function they built cannot see the decision to abandon it, not because the information is missing but because the identity will not permit it.

The deepest limitation is that Simon’s model explains routine and expert decision-making well but is weaker on genuinely creative or transformative decisions. Near-decomposability assumes that the system can be broken into manageable pieces. Some problems resist decomposition. Some situations require the kind of holistic reorientation that Boyd describes and that Stacey’s complex responsive processes theory argues cannot be designed at all. Simon would reply that even in these situations, the design of stable intermediate forms is the best strategy available. The debate is genuine, and the series holds both positions.

8. Why Simon Governs the Identity Lever

Simon governs Identity in the Deciding phase because his work defines what the decision-maker can and cannot do. Bounded rationality is not a flaw to be corrected. It is the condition within which all deciding happens. Satisficing is not settling. It is the only rational strategy when optimisation is impossible. Decision premises are not inputs to decisions. They are the identity of the decision: the set of facts, values, goals, and constraints that determine what the decision-maker considers, what they ignore, and what they never think to question.

The parallel to Bourdieu is exact. Bourdieu says: you cannot learn what your habitus will not let you perceive. Simon says: you cannot decide what your cognitive bounds will not let you process. Both govern through constraint on what is available. The leader who wants to improve decision quality must therefore work on two fronts simultaneously: the sociological (changing the habitus through new practice, new exposure, new fields) and the cognitive (redesigning the decision environment so that the right premises reach the right people at the right time). Beer provides the architecture for the second. Bourdieu provides the theory of the first. Simon tells you why both are necessary.

(An Organisational Prompt is something you can do now....)

Audit the premises entering one decision.

Pick a decision your organisation makes repeatedly: which AI use cases to prioritise, which teams to fund, which specifications to approve. Do not evaluate the decision. Evaluate the premises. What information enters the decision? What information does not? Who provides the facts? Who provides the values? Who decides what counts as “good enough”? Map the premises on a single page. The pattern will reveal that the decision is largely pre-decided by the premises that enter it, and that the premises are shaped by the organisation’s structure, not by the decision-maker’s judgment. If you want different decisions, redesign the premises. The people are not the problem. The architecture is.

Further Reading

Herbert Simon: Administrative Behavior - The foundational text on organisational decision-making. Bounded rationality, satisficing, and decision premises. Written in 1947 and revised over fifty years, it remains the single most important book on how organisations actually decide.

Herbert Simon: The Sciences of the Artificial - The broader framework on design, complexity, and artificial systems. The definition of design as changing existing situations into preferred ones. Essential reading for anyone who believes the Deciding phase argument that decisions are design challenges.

Herbert Simon: The Architecture of Complexity - The single most important essay on hierarchical systems, near-decomposability, and why complex systems evolve from simple ones. Twenty pages that provide the theoretical foundation for bounded contexts, microservices, and team topologies.

Herbert Simon and James March: Organizations - The collaborative work on how organisations shape behaviour through routines, premises, and structures. March’s contribution on exploration and exploitation extends Simon’s framework into the strategic domain.

Herbert Simon: Designing Organizations for an Information-Rich World - The attention economy essay. “A wealth of information creates a poverty of attention.” Remarkably prescient; freely accessible.

Christensen: How 'Good' Decisions Can Destroy Transformation

Justin Arbuckle — Thu, 14 May 2026 07:01:05 GMT

Clayton Christensen asks you to consider the possibility that your organisation is failing because its leaders are competent.

Not despite their competence. Because of it. The better they are at listening to customers, investing in higher-margin opportunities, and improving existing products, the more reliably they will miss the thing that eventually replaces them. This is not a metaphor. Christensen documented it happening, with mechanical precision, across disk drives, steel, retail, and education. The pattern is always the same: the incumbent does everything right and loses anyway.

Most management theory assumes that failure comes from bad decisions. Christensen’s contribution is the demonstration that failure can come from good ones. That is a far more uncomfortable idea, and it is the one that matters for AI transformation. Because if failure only came from bad decisions, you could fix it with better analysis, better leadership, or better governance. But if failure comes from the structure of rational decision-making itself, from the processes and values that define what the organisation considers worth doing, then the fix is not better management. It is different management, operating in a different structure, with different criteria for success. And that, as we shall see, is the one thing the existing organisation is least equipped to create.

A note on scope. Christensen (1952-2020) published extensively across innovation, strategy, education, healthcare, and personal philosophy. The Innovator’s Dilemma (1997) remains the foundational work. The Innovator’s Solution (2003), co-authored with Michael Raynor, extended the theory into prescriptive territory. Competing Against Luck (2016), co-authored with Taddy Hall, Karen Dillon, and David Duncan, developed the Jobs to Be Done framework. This article focuses on three interlocking ideas: the disruption mechanism, the RPV (Resources, Processes, Values) framework for organisational capability, and Jobs to Be Done as a tool for purpose clarity. It does not attempt a comprehensive survey.

I was lucky enough to be taught by Clay at a Harvard Business School course on Innovation in the early 2000’s. It remains the best single teaching experience I have ever had and I hope I manage to communicate some of his insight to justify his time spent teaching me.

1. The Disruption Mechanism: Why Good Management Causes Failure

Christensen’s central insight is counterintuitive and, once understood, profoundly uncomfortable. Disruptive innovation is not about better technology defeating worse technology. It is about a specific process by which new entrants, typically with fewer resources and initially inferior products, displace established firms that are doing everything right by conventional management standards.

The mechanism operates through a sequence that Christensen documented across industries from disk drives to steel to retail. A new entrant introduces a product that is simpler, cheaper, and often worse on the performance dimensions that mainstream customers value. Because the product is worse on those dimensions, the incumbent rationally ignores it. The new entrant’s product appeals to customers at the low end of the market, or to customers who were previously non-consumers. The incumbent, listening to its best customers and pursuing higher margins, moves upmarket. Meanwhile, the new entrant improves its product along a sustaining trajectory until it meets the needs of the mainstream market. By the time the incumbent recognises the threat, its cost structure, processes, and organisational identity make an adequate response nearly impossible.

The critical word is rationally. The incumbent is not asleep. It is making the decisions that every MBA programme, every board, every management framework would endorse: listen to your customers, invest in higher-margin opportunities, improve your existing products. Christensen’s contribution is the demonstration that this rationality, applied consistently, produces its own destruction.

Weber diagnosed the same mechanism at a deeper level. His means-ends rationality; given this goal, “what is the most efficient way to achieve it?” is precisely the logic that drives the incumbent upmarket. The question “are we serving our most profitable customers?” is a means-ends question. The question “should we be serving different customers entirely?” is a value-rationality question, and the organisation has no structural mechanism for processing it.

Christensen provides the market-level evidence for what Weber described as a civilisational tendency: the progressive displacement of the question “is this the right goal?” by the question “are we achieving this goal efficiently?”

Argyris would recognise the psychological dimension. The defensive routines he documented; the suppression of information that threatens existing strategies, the skilled incompetence that prevents leaders from acknowledging what they do not know; are the human-level expression of the innovator’s dilemma. The data showing that a cheap, inferior product might eventually threaten the core business is precisely the kind of information that defensive routines are designed to suppress. Not through conspiracy, but through the ordinary operations of Model I behaviour: maintain control, suppress negative feelings, be rational, win. When “being rational” means pursuing the higher-margin opportunity, the disruptive threat becomes undiscussable.

2. The RPV Framework: Why Organisations Cannot Do What Their Leaders Decide to Do

Christensen’s second major contribution, less well known than disruption theory but arguably more useful for practitioners, is the RPV framework. An organisation’s capabilities and disabilities are defined by three factors: its Resources, its Processes, and its Values.

Resources are what the organisation has: people, technology, cash, brand, relationships, data. Resources are the most visible and most transferable element. You can hire new people, buy new technology, allocate new budget. Most organisational responses to AI begin here: hire data scientists, license AI platforms, establish an AI centre of excellence. Resources are necessary but not sufficient.

Processes are how the organisation converts resources into outcomes: the patterns of interaction, communication, coordination, and decision-making that have evolved to support the core business. Processes include formal procedures (approval workflows, budgeting cycles, release processes) and informal patterns (how decisions actually get made, who talks to whom, what information flows where). Processes are much harder to change than resources because they are embedded in organisational muscle memory. They are, in Pierre Bourdieu’s terms, the institutional habitus: the dispositions, routines, and taken-for-granted practices that reproduce the organisation’s way of working regardless of what the strategy says.

Values are what the organisation prioritises: the criteria by which employees make decisions about resource allocation. Christensen uses “values” in a specific, non-moral sense. An organisation’s values determine what it finds worth doing: which opportunities to pursue, which customers to serve, which margin thresholds to accept. Over time, values optimise around the organisation’s cost structure and business model. A company accustomed to 40% gross margins will systematically deprioritise opportunities that offer 15% margins, regardless of what the strategy deck says about “new markets” or “transformation.”

The RPV framework explains why adding resources (the typical first response) rarely produces transformation. The new AI team arrives with resources; talented people, modern tools. But it operates within existing processes; the same approval cycles, the same budgeting logic, the same release management. And it is evaluated by existing values; the same margin expectations, the same customer priorities, the same quarterly targets. The result is that the new resources are absorbed into the existing system, producing sustaining improvements to the current business rather than the transformative change the organisation intended.

Anthony Giddens described this as structuration: the way that structures are both produced by and productive of action. The processes and values of the organisation are not inert constraints. They are actively reproduced through the daily decisions of every employee, and through that reproduction they shape what kinds of decisions are thinkable. When the AI team’s budget request goes through the standard capital allocation process, the process itself determines which projects survive. Projects that fit the existing business model score well. Projects that challenge it score poorly. Not because anyone decided to block transformation, but because the process was designed to optimise the existing business, and it does so with impressive reliability.

Talcott Parsons would add that the values dimension operates as a pattern-maintenance function. The organisation’s values, in Christensen’s sense, are the operational expression of what Parsons called the latent pattern-maintenance subsystem: the mechanism by which the organisation preserves its core identity and assumptions against perturbation. The AI initiative is a perturbation. The values system will absorb it, redirect it, and domesticate it until it serves the existing pattern rather than disrupting it.

3. Jobs to Be Done: The Clarity Mechanism That Most Organisations Lack

Christensen’s third major contribution addresses the question that the series has been building toward: how does an organisation get clear on what it should actually be doing?

Jobs to Be Done (JTBD) theory, developed with Bob Moesta and popularised in Competing Against Luck, reframes the fundamental question of purpose. Instead of asking “what products should we build?” or “what markets should we serve?”, it asks: “what job is the customer hiring our product to do?” The “job” is the progress a person is trying to make in a particular circumstance. It has functional dimensions (what the product must do), social dimensions (how the customer wants to be perceived), and emotional dimensions (how the product makes the customer feel).

Christensen’s famous milkshake example illustrates the power of the reframe. A fast-food company wanted to improve milkshake sales. Traditional market research; demographics, taste preferences, competitive analysis; produced incremental improvements that did not move sales. A JTBD analysis revealed that nearly half of milkshakes were sold before 8:30am to commuters who had a long, boring drive ahead and needed something that would occupy them for twenty minutes and keep hunger at bay until lunch. They were not buying a milkshake. They were buying a breakfast companion for a tedious commute. The competitors were not other milkshakes. They were bananas, bagels, doughnuts, and boredom. This reframe produced entirely different design criteria: thickness (to last the drive), chunkiness (to provide interest), portability (to work with one hand on the steering wheel).

The JTBD framework is a purpose clarity mechanism. It forces the organisation to articulate, with precision, what problem it is solving, for whom, under what circumstances. This is exactly the discipline that Peter Drucker demanded when he insisted that the purpose of a business is to create a customer, and that every organisational decision must be traceable to value created for someone outside the organisation. JTBD operationalises Drucker’s principle by providing a specific methodology for discovering what that value actually is.

The connection to specification-driven development is direct. The JTBD framework answers, at the strategic level, the same question that a specification answers at the implementation level: what exactly are we trying to achieve, for whom, under what constraints? A specification that cannot be traced to a job the customer needs done is a specification without purpose. It may be syntactically valid, structurally consistent, and fully tested, and it will still produce something nobody needs.

4. Disruption as a Purpose Problem: Why AI Adoption Gets Stuck

Christensen’s three frameworks, taken together, explain a pattern visible in almost every large enterprise AI programme.

The disruption mechanism predicts that the organisation will invest in AI primarily to improve its existing products and processes; sustaining innovation; rather than to create fundamentally new value propositions. This is not a mistake. It is the rational response of an organisation whose processes and values are optimised for the current business.

The RPV framework predicts that even when leadership mandates transformative AI adoption, the organisation’s processes (budgeting, governance, approval, measurement) and values (margin thresholds, customer priorities, risk tolerance) will redirect that mandate into sustaining activity. The AI centre of excellence will produce efficiency improvements. It will not produce disruption. This is not because the people in the centre are unimaginative. It is because the organisational system within which they operate was designed to optimise the core business, and it does so with the reliability that Taylor’s scientific management always aspired to.

The JTBD framework reveals the deeper problem: most AI programmes do not know what job the AI is being hired to do. “Leverage AI for competitive advantage” is not a job. “Help the underwriting team process applications 40% faster” is closer, but it is still a sustaining improvement to an existing process. “Enable a customer who cannot currently afford specialist financial advice to get personalised guidance at a fraction of the cost” is a job. It is also, in Christensen’s terms, potentially disruptive; and therefore precisely the kind of opportunity the organisation’s values system will deprioritise.

Heifetz would say: the organisation is treating an adaptive challenge as a technical one. AI adoption is not a technical challenge of implementing tools. It is an adaptive challenge of discovering what the organisation should become, which requires changes in values, habits, and identity that cannot be specified in advance. The JTBD framework provides a method for that discovery, but the method itself requires the organisation to ask questions that its current structure is designed to prevent.

Richard Normann provides a strategic framing. His concept of the Prime Mover; the leader who reorganises the entire value constellation rather than competing within the existing one; maps directly onto Christensen’s distinction between sustaining and disruptive innovation. The leader who uses AI to make existing processes faster is competing within the constellation. The leader who uses AI to reconfigure the relationships between human expertise, machine capability, and customer self-service is attempting ecogenesis. Normann’s insight is that the second path requires conceptual elegance, not just organisational energy. You cannot disrupt your way to a new value constellation by doing the same things faster. You must reframe what is being done, for whom, and why.

5. The Separation Solution and Its Organisational Cost

Christensen’s prescription for the innovator’s dilemma is structural: create a separate organisational unit with its own processes and values, free from the gravitational pull of the core business. The separate unit must have its own resource allocation, its own cost structure, its own success metrics, and, critically, its own proximity to the customers whose jobs it is trying to understand. Only the CEO can ensure this separation survives, because only the CEO has the authority to override the core organisation’s natural impulse to absorb, redirect, or defund the disruptive effort.

This prescription has been widely adopted and widely misapplied. The “innovation lab” or “digital accelerator” is a common organisational response, but many such units fail because they achieve separation without purpose. They are structurally isolated from the core business but have not done the JTBD work to identify what job they are actually solving.

Weick adds a dimension that Christensen underplays. The separate unit must not only have different processes and values. It must have different sensemaking frameworks. The people in the unit must be able to perceive opportunities that are invisible to the core organisation. This requires not just structural separation but cognitive separation: different mental models, different frames of reference, different criteria for what counts as interesting. Weick’s concept of enacted sensemaking suggests that what the unit does will determine what it sees. If it is doing demonstrations for the executive committee, it will see opportunities that impress executives. If it is sitting with non-consumers trying to understand their struggles, it will see opportunities that serve unmet needs.

Stacey would challenge the entire notion of a planned separation. In his framework, genuine innovation emerges from patterns of interaction that nobody designs. The separate unit is itself a designed response to an emergent phenomenon, and designing the response risks domesticating the very novelty it aims to cultivate. This is not a fatal objection; Christensen would argue that the separate unit creates the conditions for emergence that the core organisation suppresses. But it is a caution: the unit’s purpose cannot be designed from above. It must be discovered through the iterative, messy, failure-rich process of doing actual work with actual customers on actual problems.

6. What Christensen Gets Wrong, and What the Critics Reveal

Christensen’s work has attracted serious criticism, most notably from historian Jill Lepore, who questioned the evidentiary basis of his case studies and argued that disruption theory lacks predictive power. Christensen himself acknowledged that disruption theory is frequently misapplied; that not every competitive threat is disruptive, that not every startup will defeat the incumbent, and that incumbents sometimes respond successfully.

The most substantive criticism, for this series, is not about whether the theory predicts correctly. It is about what the theory assumes about organisational agency. Christensen’s disruption mechanism is structural and deterministic: the incumbent’s processes and values make an adequate response nearly impossible. This leaves little room for the kind of organisational learning that Argyris and Senge describe, in which organisations can, under the right conditions, surface their own assumptions and change their own behaviour. Christensen’s prescription; create a separate unit; is a structural workaround for a learning failure. It does not address the learning failure itself.

Dweck’s research suggests that the “fixed” quality Christensen attributes to organisational values may be more malleable than his theory implies. A fixed mindset treats capability as static; the organisation’s values are what they are, and the only response is structural separation. A growth mindset treats capability as developable; perhaps the organisation can learn to hold multiple value systems simultaneously, to evaluate opportunities against both sustaining and disruptive criteria. This is not easy. But dismissing it as impossible may be premature.

7. What Christensen Contributes to the Series

For AI transformation specifically, Christensen contributes three things that the thinkers already reviewed do not provide.

First, a market-level explanation for why organisations invest in AI and still fail to transform. The disruption mechanism explains this not as a leadership failure or a cultural problem but as a structural consequence of rational management within an optimised system. This complements the psychological explanations (Argyris, Dweck), the sociological explanations (Bourdieu, Giddens), and the systems explanations (Beer, Stacey) already in the series.

Second, the RPV framework provides a diagnostic tool that connects market dynamics to organisational capability. When a CTO asks “why can we not adopt AI more effectively?”, the RPV framework directs attention away from resources (where most organisations look first) and toward processes and values (where the actual constraints operate).

Third, Jobs to Be Done provides a methodology for the purpose clarity that the series has identified as the precondition for effective action. It is not sufficient on its own; it must be connected to domain modelling, to specification-driven development, and to the organisational learning conditions documented in the first phase. But it provides the strategic starting point: before you can specify what to build, you must understand what job the customer needs done.

The synthesis is this: Christensen explains why organisations rationally fail to disrupt themselves. The series’ earlier thinkers explain the psychological, social, and structural mechanisms through which that failure is enacted. And the JTBD framework, connected to the specification practices described in the Deciding phase, provides a path from vague aspiration to precise purpose. The organisation that can answer “what job is the customer hiring us to do?”, trace that answer through domain models into bounded contexts, and express those contexts as specifications precise enough for AI to act on, has solved the clarity problem. The organisation that cannot is producing strategy decks.

8. LLMs as Disruptors: When the New Entrant Improves Faster Than the Theory Predicted

Christensen’s disruption mechanism assumes a particular tempo. The new entrant starts inferior, improves along a sustaining trajectory, and eventually meets mainstream needs. In the cases he studied; disk drives, steel minimills, discount retail; that trajectory took years, sometimes decades. Incumbents had time. They chose not to respond, but they had time.

Large language models have compressed that timeline to the point where Christensen’s own framework needs updating.

In March 2025, the AI evaluation organisation METR published research showing that the length of software engineering tasks AI models can complete autonomously has been doubling approximately every seven months since 2019. By January 2026, METR’s updated analysis showed the post-2023 trend had actually accelerated: the doubling time had shortened to roughly 89 days. In late 2025, Anthropic’s Claude Opus 4.5 demonstrated the ability to independently complete tasks that would take a human professional approximately five hours; a capability that exceeded even the exponential trend’s predictions.

This is not the slow, steady upmarket march that Christensen documented. This is a capability frontier that shifts faster than any organisational planning cycle can track. And it is producing disruption patterns that are both recognisable from Christensen’s framework and fundamentally different in their dynamics.

Customer service: the Klarna experiment. Klarna, the Swedish fintech company, ran what amounts to a live test of AI disruption against its own operations. In 2023, the company partnered with OpenAI and deployed an AI assistant that handled 75% of customer chats; approximately 2.3 million conversations in over 35 languages within its first month. CEO Sebastian Siemiatkowski claimed the AI was doing the work of 700 customer service agents. Klarna stopped hiring, let attrition reduce headcount from 5,500 to roughly 3,400, and publicly declared that AI could already do all human jobs.

By mid-2025, Klarna reversed course. Customer satisfaction had dropped. The AI handled routine queries well but failed on anything requiring empathy, nuance, or complex problem-solving. Siemiatkowski admitted the company had “focused too much on efficiency and cost” and that the result was “lower quality.” Klarna began rehiring human agents in a freelance model while repositioning AI as a support tool rather than a replacement.

Christensen would recognise the pattern instantly. Klarna treated a disruptive technology as if it were a sustaining one; as if you could swap it into the existing value chain and get the same outcomes at lower cost. But the “job” the customer was hiring customer service to do was not “answer my question quickly.” It was “make me feel heard, resolve my specific situation, and give me confidence that someone is accountable.” The AI performed the functional dimension of the job but failed the social and emotional dimensions. JTBD analysis would have predicted this. Klarna’s values; margin improvement, headcount reduction, operational efficiency; were precisely the sustaining-innovation values that Christensen warned would misdirect the application of a disruptive technology.

But here is what makes LLM disruption different from disk drives. When Klarna reversed course, the AI had not stopped improving. The models available in early 2026 are substantially more capable than those Klarna deployed in 2024. The question is not whether AI customer service failed. It is whether the failure was permanent or temporary; whether the gap between what the AI could do and what the job required is closing at the rate METR’s trend implies. If it is, then Klarna’s reversal is not a vindication of human customer service. It is an interlude.

Content creation: the Duolingo acceleration. Duolingo’s AI-first pivot illustrates the production-side disruption that Christensen’s theory handles well. In April 2025, CEO Luis von Ahn announced that the company would “gradually stop using contractors to do work that AI can handle.” Within days, Duolingo launched 148 new AI-created language courses; content that would previously have taken a decade to produce manually was completed in under a year.

This is classic Christensen: the AI produces output that is cheaper, faster, and initially somewhat lower quality (”we’d rather move with urgency and take occasional small hits on quality,” von Ahn wrote). The contractors, like Christensen’s incumbent, were serving the existing market well. But the AI’s cost structure makes a fundamentally different value proposition possible; not marginally better courses, but dramatically more courses, covering languages and markets that could never justify the cost of human content creation. The non-consumers; learners of minority languages, niche combinations, low-revenue markets; become accessible. That is textbook low-end and new-market disruption operating simultaneously.

Software development: disruption within disruption. The AI coding tools market reveals something Christensen’s framework did not anticipate: disruption happening so fast that the disruptors are being disrupted before the incumbents have finished responding.

GitHub Copilot, backed by Microsoft, launched in 2021 and by 2025 had reached 20 million users and deployment in 90% of Fortune 100 companies. It is, in Christensen’s terms, the incumbent’s response to AI-assisted development; integrated into the existing IDE ecosystem, designed to sustain existing workflows by making them faster. Copilot is a sustaining innovation. It helps developers do what they already do, more efficiently.

Cursor, built by four MIT graduates, took a different approach. Rather than bolting AI onto an existing editor, they forked VS Code and rebuilt it as an AI-native environment where the model understands entire repositories and makes coordinated changes across multiple files. Cursor reached a $29.3 billion valuation in under two years and hit $1 billion in annualised revenue. Meanwhile, Claude Code introduced a third paradigm: terminal-based agentic coding where you hand a task to the AI and it comes back done.

The market fragmented into a near three-way split between Copilot, Cursor, and Claude Code within 18 months. This is not the multi-year disruption cycle of steel minimills. This is architectural disruption at startup speed, where even the disruptors cannot establish a stable position before the next paradigm arrives.

And here is the uncomfortable finding: a 2025 METR randomised controlled trial of 16 experienced open-source developers found that those using AI coding tools were 19% slower than those without; while believing they were 20% faster. The gap between perceived and actual productivity was nearly 40 percentage points. The industry built on the premise that AI makes programmers dramatically more productive has yet to demonstrate that claim under controlled conditions. Christensen would note that this is precisely the kind of evidence that defensive routines are designed to suppress: the data that challenges the strategic premise on which billions of investment depend.

Legal services: the slow-motion disruption. Legal work follows a different tempo. By early 2026, 64% of legal organisations reported actively integrating LLMs into their workflows, primarily for document review, research, and compliance. The technology handles the low-end work; summarising cases, drafting standard clauses, identifying relevant precedent; while human lawyers retain the high-value work of strategy, advocacy, and judgment.

This looks like the early stages of a textbook Christensen disruption. The AI enters at the bottom of the market, doing work that is too routine or low-margin for experienced lawyers to justify their time on. The legal profession rationally focuses on higher-margin advisory work. Meanwhile, the AI improves. Companies like Darrow and Harvey are building legal-specific AI systems that perform tasks once requiring qualified professionals. The question Christensen would pose: at the current rate of improvement, when does the AI’s capability meet the mainstream client’s needs for all but the most complex matters?

The legal profession’s response so far has been classic incumbent behaviour: absorb the technology as a sustaining tool (AI-assisted lawyers do more billable work), invest in higher-margin services, and move upmarket. Christensen’s framework predicts exactly what happens next.

What LLMs reveal about disruption theory itself. These examples expose a limitation in Christensen’s original framework that matters for anyone leading transformation.

His model assumes that the pace of disruption is slow enough for organisations to choose between responding and not responding. The incumbent has time to study the threat, evaluate the technology, create a separate unit, discover the right jobs to be done, and build a new business model.

When capability doubles every seven months, or every 89 days, that assumption collapses. The planning cycle for creating a separate organisational unit; identifying the opportunity, securing sponsorship, recruiting a team, establishing different processes and values; takes longer than a full doubling of the disruptive technology’s capability. By the time the separate unit is operational, the technology it was designed to exploit has moved on.

This does not invalidate Christensen. It radicalises him. If the innovator’s dilemma was already difficult when disruption took years, it becomes structurally impossible when disruption takes months. The RPV framework becomes not just a diagnostic but an alarm: every month that existing processes and values constrain the organisation’s response, the gap between what the AI can do and what the organisation permits it to do widens. Stacey’s insight becomes more urgent: in conditions of genuine novelty, the only viable strategy is continuous experimentation at the pace of change, not planned responses to yesterday’s capability.

The organisation that treats AI transformation as a programme with a fixed scope and a multi-year roadmap is making the innovator’s error at double speed. The organisation that builds the capacity for continuous, small-scale experimentation; what Weick called “small wins” has a chance of keeping pace. Not because it can predict where the technology is going, but because it can respond to where the technology actually is, at the tempo the technology demands.

(An Organisational Prompt is something you can do now...)

The Cannibalisation Question

Find the person who controls your AI initiative’s budget. Ask them: “If this initiative could create a new revenue stream worth £2 million but required cannibalising an existing stream worth £8 million, would you approve it?”

The speed of the answer tells you more than the answer itself. If there is hesitation, qualification, or a redirect to “let’s discuss that offline,” your organisation’s values permit sustaining innovation only. That is not a moral failing; it is a structural fact. But it means your AI transformation will produce efficiency gains, not disruption, regardless of what the strategy deck promises.

Further Reading

Clayton Christensen: The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail - The foundational work. Read it for the disruption mechanism and the RPV framework. The disk drive case studies remain the most rigorous exposition; the principles apply without modification to AI adoption in enterprises.

Clayton Christensen, Michael Raynor: The Innovator’s Solution: Creating and Sustaining Successful Growth - The prescriptive companion to The Innovator’s Dilemma. Read it for the structural responses to disruption, including the separation strategy and the RPV diagnostic applied to new growth businesses.

Clayton Christensen, Taddy Hall, Karen Dillon, David Duncan: Competing Against Luck: The Story of Innovation and Customer Choice - The most complete statement of Jobs to Be Done theory. Read it for the methodology of purpose clarity; the milkshake story is chapter one, but the depth of the framework emerges across the full text.

Clayton Christensen, Michael Horn, Curtis Johnson: Disrupting Class: How Disruptive Innovation Will Change the Way the World Learns - Disruption theory applied to education. Read it for the pattern of applying the theory beyond commercial markets, and for the insight that modular architectures enable disruption in knowledge-intensive domains.

Clayton Christensen: How Will You Measure Your Life? - Christensen applies his business frameworks to personal decisions. Read it for the argument that the same structural forces that cause corporate failure; optimising for short-term metrics, neglecting investments that do not show immediate returns; operate in individual careers.

Jill Lepore: The Disruption Machine - The most prominent critique of disruption theory. Read it for the evidentiary challenges, the limitations of the case study method as applied to prediction, and the broader cultural implications of making “disruption” a managerial imperative.

METR: Measuring AI Ability to Complete Long Tasks - The time-horizon metric showing that the length of software tasks AI agents can complete autonomously has been doubling approximately every seven months since 2019. Essential context for understanding why LLM disruption operates on a fundamentally compressed timeline. Updated in Time Horizon 1.1 (2026), which shortened the post-2023 doubling time to approximately 89 days.

What Can We Learn About Decisions from Commanding on a Battlefield

Justin Arbuckle — Mon, 11 May 2026 07:00:39 GMT

Your governance framework has an approval process. It has a risk register. It has a board that meets monthly. Every project is assessed, scored, reviewed, and either approved or rejected. The process is thorough, well-documented, and entirely satisfying to the compliance function.

Nobody can tell you who is actually accountable for whether it produces value.

This is the accountability problem. Not the absence of governance, but the presence of governance structures that absorb accountability rather than assigning it. When something goes wrong with something, the response is not “I am accountable” but “the process was followed.” When a strategy fails to produce results, the response is not “I own this failure” but “the roadmap was approved by the steering committee.” The governance apparatus does not clarify who is responsible. It ensures that nobody is. This is a common trope in many organisations.

Five authors from military backgrounds have spent the last two decades writing about a version of this problem that the military has been solving, imperfectly but persistently, for two hundred years. They come from different services, different decades, and different operational contexts. They arrive at essentially the same architecture. And that architecture has something practical to say about how organisations achieve the clarity required to act decisively under conditions of uncertainty (and complexity), which is exactly the condition that most transformations create.

I grew up in apartheid era South Africa. It was heavily policed and visibly militarised. I was an activist with NUSAS and the End Conscription Campaign, so I have a different, rather more cautious, approach to military (or militarised) leadership than many others, even though my cousin was a Major in the British Army. Even so, the lessons that we can derive from structures designed to ensure performance in very high risk situations are very real. Military metaphors have obvious limitations in enterprise contexts though. The authors themselves acknowledge this, with varying degrees of rigour. The adversarial framing does not transfer to every business situation. Military command authority has legal force that civilian management does not. Military organisations invest far more in selection and training than most enterprises, which means the trust that enables mission command is warranted by an investment most businesses have not made. I am glad I don’t have to do 20 pull ups before being employed! The apparent moral clarity of military operations (”defeat the enemy”) is less ambiguous than the typical enterprise transformation objective. These limitations are real, and this article addresses them directly. But the core insight transfers still: accountability and autonomy are not opposites. They are preconditions for each other. And without both, there is no clarity.

1. The Convergence: Five Authors, One Architecture

The five authors I will discuss, are David Marquet (Turn the Ship Around!, 2013), Stanley McChrystal (Team of Teams, 2015), Jocko Willink and Leif Babin (Extreme Ownership, 2015; The Dichotomy of Leadership, 2018), Stephen Bungay (The Art of Action, 2011), and Jim Mattis (Call Sign Chaos, 2019). Their backgrounds span the US Navy submarine force, US Joint Special Operations Command, US Navy SEALs, British military history and management consulting, and the US Marine Corps. They write in different registers: Marquet is analytical, McChrystal is systemic, Willink is motivational, Bungay is scholarly, Mattis is reflective. They disagree about important things.

What they agree on is an architecture. Expressed in slightly different vocabularies, each author describes three elements that must be present simultaneously for an organisation to achieve clarity of purpose and coherent action.

Shared understanding. Marquet calls it clarity. McChrystal calls it shared consciousness. Bungay calls it alignment around intent. Mattis calls it centralised vision. Boyd, whose work undergirds all five, calls it Einheit: a shared outlook born of common experience and mutual trust. The principle is the same: before you can delegate authority, everyone must understand what the organisation is trying to achieve and why. Not in the abstract language of a strategy slide, but with enough precision that any person, confronted with an unforeseen situation, can determine what action would serve the purpose without asking for permission.
Delegated authority. Marquet calls it control pushed down to where the information lives. McChrystal calls it empowered execution. Willink calls it decentralised command. Bungay calls it autonomy around actions. Mattis calls it decentralised planning and execution. The principle: the people closest to the work must have the authority to decide how to achieve the intent, because they have information that the leader cannot possess and the situation will change faster than the approval chain can process.
Maintained accountability. This is where the five authors are most distinctive and where their contribution to the clarity problem is most direct. Accountability is not surveillance. It is not approval chains, dashboards, or compliance checks. It is the structural condition in which a named person owns the outcome: not the process, not the inputs, not the effort, but the result. And that ownership is what makes the first two elements safe. Shared understanding without accountability produces a well-informed organisation that still cannot act. Delegated authority without accountability produces chaos. Accountability without shared understanding and delegated authority produces micromanagement. All three must act together.

This architecture is old. It traces to the Prussian military reforms of the early nineteenth century, through Moltke the Elder’s Auftragstaktik (mission-type tactics), through Boyd’s organic design principles, to its contemporary expression in these five authors.

2. Bungay’s Three Gaps: Why Instinctive Reactions Destroy Clarity

Bungay provides a rigorous diagnosis. Drawing on two centuries of military history and seventeen years at the Boston Consulting Group, he identifies three gaps that prevent organisations from turning strategy into results.

The knowledge gap is the difference between what we would like to know and what we actually know. In an AI transformation, the knowledge gap is enormous: nobody knows with certainty which AI applications will create value, which roles will change, or how the technology will evolve. The instinctive reaction to this gap is to demand more information, more analysis, more readiness assessments, more maturity models. Bungay’s insight is that this reaction widens the gap rather than closing it. More analysis consumes time and creates false precision. The relevant knowledge only emerges from doing. Weick made the same argument about sensemaking: you cannot know what you think until you see what you say. Bungay adds the organisational mechanism by which this insight is systematically suppressed.
The alignment gap is the difference between what we want people to do and what they actually do. Communication is inherently lossy; what is intended by the sender is not what is understood by the receiver. The instinctive reaction is to provide more detailed instructions, more controls, more oversight. Bungay shows that this widens the alignment gap by removing the subordinate’s ability to adapt to local conditions. The more precisely you specify the actions, the less able people are to respond to situations the specification did not anticipate.
The effects gap is the difference between what we expect our actions to achieve and what they actually achieve. The environment is unpredictable; actions produce unintended consequences; other actors respond in unexpected ways. The instinctive reaction is to tighten control, increase reporting, and demand compliance. Bungay shows that this widens the effects gap by preventing the adaptation that would close it.

The critical insight is that the instinctive reaction to each gap, more detail, more control, more reporting, makes all three gaps worse simultaneously. The organisation caught in this cycle produces increasingly elaborate plans that are increasingly disconnected from reality.

Bungay’s resolution is what he calls directed opportunism: high alignment and high autonomy at the same time. This is not a compromise between centralised control and decentralised chaos. It is a different architecture entirely, one in which alignment is achieved around intent (what to achieve and why) and autonomy is granted around actions (what to do and how). Mintzberg would recognise this as emergent strategy given a mechanism. Stacey would recognise it as skilled participation in ongoing processes. Peters would recognise it as “simultaneous loose-tight properties” given structural expression.

The practical mechanism is the briefing and backbriefing cascade. Leadership communicates intent downward, adding specificity at each level to the tasks implied by the higher intent. Subordinates explain their understanding of the intent and their planned actions upward. This two-way exchange catches misalignment before execution begins. It is enacted sensemaking: the subordinate articulates their understanding, and the leader can see whether it is adequate before action begins.

Argyris would note that this process makes the theory-in-use visible and testable, which is precisely what defensive routines prevent. Bungay’s backbriefing is a structural mechanism for surfacing the undiscussable.

For AI transformation, the application is obvious. Leadership communicates AI intent: “We want to use AI to reduce the time from specification to working prototype by 50%.” Teams backbrief how they intend to achieve this in their specific context. Leadership corrects misunderstanding before execution begins. The strategy emerges from the pattern of successful experiments conducted by empowered people operating within a shared understanding of what they are trying to achieve. This is also Drucker’s Management by Objectives done properly: objectives cascade downward; understanding cascades upward; the result is alignment without micromanagement.

3. Marquet’s Three Pillars: Why Clarity Cannot Exist Without Competence and Control

Marquet provides a diagnostic. His experience commanding the USS Santa Fe, the worst-performing submarine in the US Navy fleet, taught him that the leader-follower model is designed to produce compliance, not thinking. The catalytic moment came during a drill when Marquet ordered “ahead two-thirds” and the officer on deck repeated the order, even though no such setting existed on that class of submarine. The entire crew had been trained to comply, not to think. The officer repeated an impossible order because “you told me to.”

Marquet’s response was to invert the authority structure. Instead of the subordinate asking “request permission to submerge the ship” and the leader deciding, the subordinate would state “I intend to submerge the ship” and the leader would assent or redirect. The difference is profound: with permission, the default is stasis absent approval; with intent, the default is action absent a veto. The person with the most knowledge states what they plan to do. The leader’s role shifts from deciding to certifying.

But Marquet discovered that you cannot simply hand control to people who lack the competence to use it or the clarity to direct it. His three pillars, control, competence, and clarity, must rise together.

Control without competence produces chaos: people making decisions they are not equipped to make. Competence without clarity produces misalignment: skilled people working at cross-purposes because they do not understand the organisation’s intent. Clarity without control produces frustration: people who know what needs to be done but lack the authority to do it.

This maps directly to the AI transformation challenge. Many organisations have pushed AI tools to their teams (control) without investing in the knowledge required to use them well (competence) and without articulating what the organisation is trying to achieve through AI (clarity). The result is predictable: people experiment randomly, outcomes vary wildly, and leadership concludes that AI “isn’t ready” or “isn’t delivering enough” or the teams “aren’t mature enough.” Marquet’s diagnosis is that the problem is not the people. It is the model. The leader-follower model, in which leadership decides and teams execute, cannot produce clarity because it structurally prevents the information that would create clarity from reaching the people who need it.

The connection to Heifetz is clear. Heifetz argues that adaptive challenges cannot be solved on behalf of others. The leader who provides the answer to an adaptive challenge is not leading; they are performing leadership while preventing the learning that would produce a genuine answer. Marquet’s “I intend to” mechanism is a structural implementation of Heifetz’s principle: it returns the work to the people with the problem while maintaining the leader’s accountability for the system within which the work is done.

4. McChrystal’s Shared Consciousness: Why Accountability Requires Transparency

McChrystal faced a different version of the problem. In 2003, the Joint Special Operations Command confronted Al Qaeda in Iraq: a loose network of small, independent cells that moved faster than the US military’s hierarchical decision-making could process. Despite vastly superior resources, manpower, and training, the coalition was losing. The wait for McChrystal’s approval was not resulting in better decisions, and the priority needed to be reaching the best possible decision in a time frame that allowed it to be relevant.

McChrystal’s solution was radical transparency. Seven thousand people attended daily Operations and Intelligence briefings for up to two hours. Embedding and liaison programmes built trust across team boundaries. Information sharing reached levels that were “entirely new to both organisations.” The purpose was not to create a well-informed hierarchy. It was to create the shared consciousness that would make empowered execution safe.

The order matters, and this is McChrystal’s distinctive contribution. Shared consciousness must precede empowered execution. Without shared understanding, decentralised action produces chaos. Without decentralised authority, shared understanding produces frustration. Neither suffices alone.

This is Senge’s shared vision given an operational mechanism: not a statement on a slide but a daily practice of radical transparency that ensures everyone understands the whole picture, not just their part.

McChrystal’s accountability model works through transparency, not hierarchy. Everyone sees the consequences of their decisions because information is shared. The general thinks out loud so that thousands can learn the decision-making framework, not just the decision. The accountability is maintained not by an approval chain but by the visibility of outcomes to everyone who participated in creating them.

The shift in leadership metaphor is significant. McChrystal describes moving from “heroic leader” to “humble gardener.” The leader’s job becomes creating and maintaining the ecosystem: ensuring information flows, connecting teams, building trust. Not making every decision. “Eyes on, hands off”: leaders see everything through shared consciousness but resist the urge to control. This is what McChrystal calls the Perry Principle: when leaders can see what is going on, they understandably want to control what is going on. Empowerment tends to be a tool of last resort, used only when the leader runs out of attention, not as a design principle.

For AI transformation, McChrystal’s insight challenges the standard governance model. Most AI governance creates information asymmetry: the governance board knows what is approved, the delivery teams know what is possible, and neither sees the other’s reality. McChrystal would argue that the precondition for empowered AI experimentation is not a better approval process but a better information architecture: one in which everyone can see what is being tried, what is working, what is failing, and why.

5. Willink’s Dichotomy: Why Accountability Is Not What You Think It Is

Willink provides the emotional and dispositional dimension. His principle of extreme ownership is frequently misunderstood as a demand for micromanagement: the leader controls everything and is therefore responsible for everything. The Dichotomy of Leadership, the sequel to Extreme Ownership, exists precisely to correct this misreading.

The core dichotomy is: “hold people accountable, but don’t hold their hands.” Accountability without autonomy produces compliance. Autonomy without accountability produces chaos. The leader must hold both simultaneously, not as a compromise but as a dynamic tension that must be navigated continuously.

Willink’s most memorable demonstration is the experiment with boat crews during SEAL training. When the leader of the best-performing crew was swapped with the leader of the worst-performing crew, performance followed the leader, not the team. “No bad teams, only bad leaders.” This is a radical version of Argyris’s insight that defensive routines are produced by leadership behaviour. The leader’s Model I behaviour creates the team’s dysfunction.

But the deeper principle is diagnostic. When something goes wrong, the leader’s first question should be “what did I fail to do?” not “who failed?” Extreme ownership drives a specific behaviour: instead of blaming subordinates, the leader examines what they failed to communicate, train, resource, or clarify.

This is Dekker’s local rationality principle applied to leadership: if the team made a bad decision, it is because the system (training, information, clarity, resources) made that bad decision rational from their perspective. And the system is the leader’s responsibility.

6. Mattis and the Three Phases: Why Clarity Is a Leadership Maturity Problem

Mattis contributes a developmental insight. He distinguishes three phases of leadership: direct (leading those you can see), executive (leading through others), and strategic (leading institutions). At each phase, the relationship between accountability and clarity changes.

At the direct level, the leader can see the problem, assess the situation, and act. Clarity is achieved through proximity. At the executive level, the leader must achieve clarity through others: communicating intent clearly enough that people the leader has never met can make good decisions. At the strategic level, the leader must create the conditions under which clarity can emerge across an entire institution.

Mattis’s formulation is clear: “centralised vision, decentralised planning and execution.” He deliberately rejects the more common “centralised planning and decentralised execution” as too top-down. The vision is centralised; the planning is not. This means that the people closest to the work plan how to achieve the intent, not merely execute a plan that has been handed to them. This is Taylor’s separation of thinking from doing explicitly rejected by a four-star general who spent forty-four years testing the alternative.

Mattis also provides a principle that connects accountability to learning: rehearsal until improvisation is possible. You prepare thoroughly, train intensively, and rehearse repeatedly, not to follow the plan but to develop the competence that allows you to improvise intelligently when the plan fails. The accountability is for the preparation, not for the prediction.

7. Accountability Sinks: Where Clarity Goes to Die

Dan Davies, drawing on Beer’s cybernetics, provides the concept that connects the military insight to the enterprise reality.

An accountability sink is a system in which decisions are delegated to rule books, standard procedures, or committee structures in a way that makes it impossible to identify who is responsible for outcomes.

Beer anticipated this: the principle of diminishing accountability states that unless conscious steps are taken to prevent it, any organisation will tend to restructure itself so as to reduce the amount of personal responsibility attributable to its actions.

Every rule is a model of the world. When the model is wrong, there is nobody to blame because everyone followed the rules. The surefire sign: “nobody is responsible” when everyone did everything right and yet results were bad. In military contexts, accountability sinks manifest as rules of engagement so detailed that soldiers follow the letter while the spirit is lost.

This is the accountability paradox that all five authors navigate. Too much control stifles initiative, speed, and adaptation. Too little accountability enables moral drift, misconduct, and systemic failure. The resolution is accountability for creating the conditions in which good decisions are made, not for making every decision yourself. The commander is accountable for the system: the training, the culture, the ethical boundaries, the detection mechanisms. Subordinates are accountable for their actions within the delegated authority.

Beer’s concept of algedonic alerts provides the structural mechanism. An algedonic alert is a pain signal that bypasses normal channels: a mechanism for frontline signals of failure to reach senior leadership without being filtered, delayed, or absorbed by intermediate layers. McChrystal’s daily O&I briefings served this function: radical transparency meant that problems could not be hidden within the hierarchy. The After-Action Review is a structured algedonic alert: it forces the system to confront the gap between intended and actual outcomes.

Westrum’s typology maps directly. In a pathological culture, accountability sinks protect the powerful; information about failure is suppressed. In a bureaucratic culture, accountability sinks are the standard operating procedure; the process is always followed and nobody is ever responsible. Only in a generative culture does accountability function as these military authors describe: named individuals own outcomes, information flows to where it is needed, and the system learns from the gap between intention and result.

8. What Transfers and What Does Not

The military-to-enterprise transfer is not automatic. Several conditions that make mission command work in military contexts are absent or weaker in enterprise settings. What does transfer is the architecture itself and the diagnostic it provides. The three questions that emerge from these five authors can be asked of any organisation:

Does everyone who needs to make decisions about AI understand, with enough precision to act without asking permission, what the organisation is trying to achieve and what constitutes an unacceptable outcome? If not, shared understanding is absent, and no amount of delegated authority will produce coherent action.
Do the people closest to the work have the authority to decide how to use AI within the boundaries of the organisation’s intent? If not, delegated authority is absent, and the organisation is running Taylor’s separation of thinking from doing in contemporary language.
When something goes wrong, can you name the person who is accountable for the outcome (not the process, not the committee, not the board)? If not, accountability has been absorbed by a sink, and the organisation cannot learn from failure because there is nobody whose job it is to learn.

9. The Connection to Specification

The Deciding phase of this series argues that AI has changed the means of production of knowledge, and that achieving clarity of purpose now requires the ability to specify intent with enough precision that machines can act on it. The military authors provide a complementary argument about the human and organisational preconditions for that specification to work.

A specification without accountability is a document. A specification with accountability is a commitment. When a named person owns the outcome that the specification describes, the specification becomes more precise (because the accountable person has an incentive to remove ambiguity), more testable (because the accountable person needs to know whether the outcome was achieved), and more honest (because the accountable person cannot hide behind vagueness).

Bungay’s briefing-backbriefing cascade is the specification review process described in military terms. The specification author communicates intent; the implementing team explains their understanding; misalignment is caught before execution.

Marquet’s “I intend to” mechanism is what specification-driven development looks like when accountability is real. The team does not ask “may we implement this specification?” The team states “I intend to implement this specification, and here is how I interpret it.” The leader assents or redirects. The team owns the implementation. The leader owns the system in which the implementation occurs.

McChrystal’s shared consciousness is the information architecture that enables specifications to be coherent across an enterprise. When everyone can see what is being specified, what is being built, and what is working, the specifications become part of a shared understanding rather than isolated documents that drift apart.

Willink’s extreme ownership applied to AI means: if the AI-generated output is wrong, the leader who deployed it owns the failure. Not the AI, not the vendor, not the team who built the prompt. The specification was the leader’s responsibility. The validation was the leader’s responsibility. The decision to deploy was the leader’s responsibility. This is the accountability that governance frameworks must create rather than absorb.

10. Where It Breaks Down: The Honest Assessment

A 2023 study published in Military Psychology applied Deci and Ryan’s Self-Determination Theory to mission command and found that the central aspects of mission command; empowerment, mutual trust, intent, initiative, shared understanding; directly map to satisfaction of the three basic psychological needs: autonomy, competence, and relatedness. Mission command is not merely a management technique. It is a motivational architecture. It works not because it is efficient but because it aligns with fundamental human psychological needs. This explains why directive command, even when faster in the short term, produces disengagement and learned helplessness in the long term.

But the tensions between the five authors reveal genuine unresolved problems.

Willink’s extreme ownership places the leader at the centre of accountability. McChrystal’s humble gardener places the leader at the edge, tending the ecosystem. Both are right in different contexts, but the advice conflicts if you try to apply both simultaneously. Bungay’s directed opportunism provides the resolution: alignment is leadership’s responsibility; autonomy is the team’s responsibility. Both must rise together.

Marquet’s inversion of authority (”I intend to”) distributes ownership to the team. Willink’s “no bad teams, only bad leaders” concentrates it in the leader. The resolution is that these describe different moments in the same process: the leader creates the conditions (Willink), and the team acts within them (Marquet).

All five authors assume relatively clear organisational boundaries, a unified chain of command, and a shared mission. Most enterprise contexts involve matrix structures, competing priorities, and ambiguous authority. Fayol’s warning about dual command is directly relevant: the matrix organisation is a structural accountability sink. When you report to both a functional lead and a delivery lead, the accountability for AI outcomes falls between the two.

The honest assessment is that mission command transfers to enterprise transformation as a diagnostic more reliably than as a prescription. The three questions; shared understanding? delegated authority? named accountability?; reveal precisely where clarity is breaking down. The solutions require the cultural, structural, and habitual changes that the Learning phase of this series described: the seven conditions for organisational learning are the preconditions for the accountability architecture that the military authors prescribe.

Further Reading

Stephen Bungay: The Art of Action: How Leaders Close the Gaps between Plans, Actions and Results - The most rigorous and scholarly of the five. Start here for the diagnostic framework. The three gaps and directed opportunism are immediately applicable to any organisation struggling with the distance between strategy and execution.

David Marquet: Turn the Ship Around! A True Story of Turning Followers into Leaders -The most practical mechanism for shifting from compliance to initiative. The “I intend to” language is something you can implement on Monday morning.

Stanley McChrystal: Team of Teams: New Rules of Engagement for a Complex World - The argument for radical transparency as the precondition for empowered execution. Read it alongside Westrum for the cultural dimension that McChrystal demonstrates but does not theorise.

Jocko Willink and Leif Babin: Extreme Ownership: How U.S. Navy SEALs Lead and Win - The emotional energy and the leadership disposition. Follow with The Dichotomy of Leadership (2018) for the necessary correction: every principle becomes a liability when taken to its extreme.

Jim Mattis and Bing West: Call Sign Chaos: Learning to Lead - The developmental arc from direct to executive to strategic leadership, and the most precise formulation: “centralised vision, decentralised planning and execution.”

John Boyd: Patterns of Conflict - The intellectual foundation beneath all five authors. Available online. Dense and rewarding.

Dan Davies: The Unaccountability Machine: Why Big Systems Make Terrible Decisions - The concept of accountability sinks applied to contemporary institutions.

(An Organisational Prompt is something you can do now....)

Organisational Prompt

Pick a specific initiative in your organisation that has stalled, underperformed, or produced ambiguous results. Not the whole transformation; one concrete initiative.

Ask three questions.

Can the people working on this initiative articulate, without consulting a document, what the organisation is trying to achieve through this initiative and what would constitute an unacceptable outcome?
Did the people closest to the work have the authority to determine how to pursue the initiative, or were they given a plan and told to execute?
When the initiative underperformed, could you name one person who was accountable for the outcome?

Now ask the harder question: if you could implement one change tomorrow, which of the three would you address first?

Disclaimer

Stafford Beer: The Purpose of a System is What it Does

Justin Arbuckle — Thu, 07 May 2026 07:01:49 GMT

A quick recap: The Deciding phase of this series rests on three levers, the same three that governed the Learning phase, now applied to a different object. In Learning, the object was the organisation’s capacity to learn. In Deciding, the object is the domain’s capacity to be understood, described, and structured so that decisions can be made. Simon governs the Identity lever: bounded rationality constrains what is available to the decision-maker, just as Bourdieu’s habitus constrains what is available to the learner. Evans governs the Information lever: the precision and pathology of domain description determines what can be specified, just as Bateson’s double bind determines what can be communicated. The third lever is Interaction: how the parts of a system relate to each other, and whether those relationships produce viable decisions or merely the appearance of them.

In the Learning phase, Illich governed Interaction. His critique was devastating and precise: institutions designed to serve a purpose tend, over time, to replace the interaction they were meant to enable. The school prevents learning. The hospital prevents health. The training programme prevents development. The structure colonises the activity. Illich diagnosed the pathology. He did not provide the architectural alternative.

Stafford Beer does. His Viable System Model is a testable structural claim about the information flows that any system requires to remain viable in a changing environment. Where Illich shows you what institutional pathology looks like, Beer shows you what healthy interaction requires. Where Simon tells you that decisions are bounded and Evans tells you that descriptions must be precise, Beer tells you that neither constraint matters if the architecture through which decisions flow cannot process the variety the environment presents. You can have brilliant, well-bounded decision-makers working from precise domain models, and the system will still fail if the interaction between its parts cannot absorb what the environment throws at it.

Beer spent four decades building this argument. He began in operations research at United Steel in the 1950s, installed one of the first computers dedicated to management cybernetics, co-founded SIGMA consultancy, led the extraordinary (although ill-fated) Project Cybersyn in Allende’s Chile, and held positions at nearly thirty universities worldwide. His key works span from Cybernetics and Management (1959) through Brain of the Firm (1972) and The Heart of Enterprise (1979) to Diagnosing the System for Organizations (1985). His motto was “absolutum obsoletum”: if it works, it is out of date. He meant it as a cybernetic principle, not a joke.

1. POSIWID: The Design Constraint That Collapses Intent into Function

Beer’s most famous contribution is a heuristic: POSIWID, the Purpose Of a System Is What It Does. In the Learning phase, this served as a diagnostic for spotting the gap between what organisations say they do and what they actually do. Applied to the Deciding phase, it becomes something harder: a design constraint on decision architecture itself.

The Deciding phase hypothesis is that decisions are design challenges, and design is a sequence of decisions under constraint. POSIWID is the constraint that collapses the distinction between intended and actual design. You do not evaluate a decision architecture by its stated intent. You evaluate it by the decisions it actually produces. If your AI governance board was designed to enable responsible adoption and it actually produces delay, then delay is its purpose. The design has produced exactly the interaction it was structured to produce. This is not a failure of the people involved. It is a feature of the architecture.

The connection to Anscombe is direct. Her test of intentional action asks whether the people involved can trace the “Why?” chain from what they are doing to what the organisation is trying to achieve. POSIWID asks the same question of the system. Under which description is the governance board’s activity intentional? The members can trace their “Why?” chain to “because the process requires review before deployment.” That is a genuine reason connecting to a genuine purpose. The problem is that the purpose it connects to is not transformation but compliance. The compliance activity is intentional. The transformation is not. Both descriptions are true. Only one is connected to reasons the participants can articulate. Beer and Anscombe, from entirely different traditions, converge on the same insight: look at what the system does, not what it says.

This reframes the leader’s task. The question is not “why are we making bad decisions?” as though the participants are deficient. The question is “what interaction has this architecture been designed to produce?” Because the architecture is producing precisely the decisions it was built to produce. Changing the decisions means redesigning the interaction.

2. Requisite Variety: Why Your Decision Architecture Cannot Process What It Needs

Beer built his entire framework on a single law from cybernetics, formulated by W. Ross Ashby: only variety can absorb variety. A system that must respond to a complex environment needs at least as much internal variety (possible states, possible responses) as the environment presents. A thermostat works because two states match the two states that matter. An organisation’s environment has effectively infinite variety: customers, competitors, regulators, technologies, economic shifts, and social changes generate more possible states than any management system can enumerate, let alone process.

Organisations manage the gap through two mechanisms: attenuation (reducing incoming variety by filtering, summarising, categorising) and amplification (increasing outgoing variety by empowering local responses, diversifying capabilities, broadening the repertoire of action). Every report that compresses a hundred data points into three RAG ratings is attenuation. Every team empowered to respond to local conditions without seeking approval is amplification. Both are necessary. The pathology is in the ratio.

Most organisations are far better at attenuation than amplification. They filter, summarise, aggregate, and simplify until the information reaching decision-makers bears almost no resemblance to the reality on the ground. Beer’s warning is blunt: “the lethal variety attenuator is sheer ignorance.” Every decision about what to measure is implicitly a decision about what not to measure, and therefore a decision about what environmental variety to ignore. When the AI transformation dashboard shows “87% of staff have completed AI training,” it has attenuated away the only information that matters: whether anyone’s actual work has changed.

For the Deciding phase, this is not merely an observation about information processing. It is a constraint on the quality of decisions the organisation can make. Simon’s bounded rationality is cognitive: the decision-maker cannot process everything. Beer’s requisite variety is structural: the architecture must be designed so that what reaches the decision-maker is worth processing. These are different constraints operating at different levels, and both must be met. An organisation can have the most rational, least biased decision-makers in the industry and still fail because the architecture feeding them information has attenuated the variety to the point where the decisions, however rational, are answers to the wrong questions.

The AI context makes this acute. AI massively amplifies the variety of possible outputs from any given input. A single specification can generate dozens of possible implementations. A single prompt can produce analyses that would have taken a team weeks. The organisation’s existing decision architecture, its review processes, approval chains, quality gates, was calibrated for human-speed, human-variety work. AI introduces variety that exceeds the requisite variety of the existing control systems. The typical organisational response is to attenuate: restrict approved models, constrain outputs to approved patterns, limit use cases to the familiar. This reduces AI’s value to fit the organisation’s existing variety capacity. Beer would argue the viable response is the opposite: amplify organisational variety through new coordination mechanisms, new forms of local autonomy, new channels for information flow, so the decision architecture can match what AI makes possible. Attenuation is the reflex. Amplification is the design challenge.

3. The Viable System Model: An Architecture for Decision Interaction

The VSM describes five interacting systems required for viability. Beer derived them not from management theory but from the architecture of the brain and nervous system. This is not metaphor. Beer claimed, and spent decades defending, the position that the structural requirements of any viable system are isomorphic: the same cybernetic description applies regardless of the substrate. The claim is strong, and it is testable, which is more than can be said for most management frameworks.

System 1: Operations. The primary activities that produce value. Multiple operational units, each interacting with its own portion of the environment. Each System 1 unit is itself a viable system; this is recursion, which I will return to. The critical principle: operational units must be as autonomous as possible. This is not a management philosophy. It is a cybernetic requirement. Without autonomy, the system lacks the variety to match its local environment. Beer characterised each System 1 through a triple vector: actuality (what we manage to do now), capability (what we could do with existing resources if we really worked at it), and potentiality (what we ought to be doing by developing resources and removing constraints). From these he derived three measures: productivity is actuality divided by capability; latency is capability divided by potentiality; performance is actuality divided by potentiality. Most organisations obsess over productivity and ignore latency entirely. The gap between what you could do and what you ought to be doing is invisible because nobody is measuring it. System 4’s job, as we will see, is essentially to realise potentiality. But it can only do this if someone has named the gap.

System 2: Coordination. The mechanisms that prevent autonomous operational units from shaking the system apart through oscillation and conflict. Shared schedules, communication protocols, standards, resource-sharing agreements. System 2 does not command; it harmonises. The conductor’s timekeeping function: it does not tell the musicians what to play, but it prevents them from playing at different tempos. Without System 2, autonomous teams generate chaos. With too much of it, autonomy is crushed and variety is destroyed.

System 3: Optimisation. The function that looks across the entire cluster of operational units from above and asks: how can the whole work better than the parts in isolation? System 3 allocates resources, establishes what Beer called the resource bargain (operational units perform in exchange for resources), and ensures synergies across operations. System 3 is concerned with the inside and now. System 3* is the audit channel: direct, sporadic access to what is actually happening on the ground, bypassing the normal reporting hierarchy. Without System 3*, the meta-system operates on curated information and cannot know its own reality. This is Argyris’s “making the undiscussable discussable” rendered as information architecture.

System 4: Intelligence. The function that scans the external environment for threats and opportunities and models possible futures. System 4 is concerned with the outside and then. It maintains a model of the environment with sufficient variety to detect relevant change. Its job is to close the gap between capability and potentiality by sensing what the environment will demand next. Evans’s knowledge crunching, the iterative dialogue between developers and domain experts that produces the domain model, is a System 4 activity: it builds the organisation’s model of the domain with enough precision to act on. When System 4 is weak, the organisation makes decisions based on a model of an environment that no longer exists.

System 5: Policy and Identity. The function that defines what the organisation is: its values, its ethos, its ground rules. System 5 is not management. It is identity. Its most critical role is balancing the tension between System 3 and System 4. “Rules come from System 5,” Beer wrote, “not so much by stating them firmly, as by creating a corporate ethos, an atmosphere.” Without System 5, the organisation fragments: Systems 3 and 4 pull in opposite directions, one demanding efficiency and the other demanding adaptation, with no mechanism for resolution.

The parallels to the other Deciding phase governors are structural, not decorative. Simon’s nearly decomposable systems are Beer’s recursive System 1 units: semi-autonomous subsystems with strong internal interactions and weaker inter-system links. The architecture of complexity that Simon described in 1962 is the architecture that Beer operationalised. Evans’s bounded contexts are Beer’s System 1 units seen from the domain modelling perspective: each has its own ubiquitous language, its own model, its own boundary. When Evans says a bounded context needs explicit interfaces to its neighbours, Beer says System 2 must coordinate the interaction between System 1 units. They are describing the same structural requirement in different professional vocabularies.

4. The 3-4 Homeostat: How Organisations Decide What to Decide

The most useful diagnostic in the VSM for the Deciding phase is the balance between System 3 and System 4: the homeostat between inside-and-now and outside-and-then. This is not the exploit/explore tension that every strategy textbook recites. It is the mechanism that governs how organisations decide what to decide.

System 3 decides within the current model. It optimises, allocates, improves. Its questions are operational: how do we do what we are doing better, faster, cheaper? System 4 decides whether the model itself needs changing. It scans, senses, models futures. Its question is strategic in the deepest sense: should we be doing something different entirely?

When System 3 dominates, the organisation makes decisions that optimise the present. AI is used to do current work faster; the Taylorist path, in which the technology serves the existing paradigm. The decision architecture is efficient but blind to environmental change. When System 4 dominates, the organisation generates strategic possibilities it cannot implement. The innovation lab produces brilliant prototypes. The strategy team publishes visionary roadmaps. But System 3 cannot absorb them because it is fully occupied managing today. The result is strategy without execution: a familiar pathology in enterprises that run AI centres of excellence detached from the teams doing the work.

Beer’s architecture requires System 5 to hold this tension without collapsing into either pole. Some teams exploit. Others explore. The identity accommodates both. This is the structural expression of what Ackoff called dissolving a problem: rather than choosing between exploit and explore, you redesign the system so the tension is maintained as a permanent, productive feature. The viable system does not decide between present and future. It maintains the interaction between them.

For AI, the 3-4 homeostat is the strategic diagnostic. Ask: is your organisation’s AI effort dominated by System 3 (making current processes faster) or System 4 (sensing how AI changes what is possible)? If the answer is overwhelmingly System 3, you are optimising your way into irrelevance. If the answer is overwhelmingly System 4, you are strategising your way into impotence. The decision about what to decide, whether to invest in optimisation or exploration, is itself a decision that requires System 5 to hold.

5. Accountability Sinks: How Decision Architecture Absorbs Its Own Feedback

Dan Davies, in The Unaccountability Machine (2024), revived Beer’s ideas for a contemporary audience by naming the mechanism through which organisations destroy the feedback their decision architecture requires. He called them accountability sinks: systems in which decisions are delegated to processes, rule books, or committees, making it impossible to identify who is responsible for outcomes.

Beer anticipated the principle: “Unless conscious steps are taken to prevent it, any organisation in a modern industrial society will tend to restructure itself so as to reduce the amount of personal responsibility attributable to its actions. This tendency will continue until crisis results.”

Accountability sinks absorb variety the way a sponge absorbs water. The feedback that would tell the organisation what is really happening, the pain signal that would force a decision, the clarity that would demand action: all of it is absorbed by the process. Nobody is accountable because everyone followed the process. The AI review board approved the approach. The risk assessment was completed. The governance framework was satisfied. The outcome was a failure that nobody owns.

This is where Beer meets Bourdieu. Beer’s algedonic alerts, pain and pleasure signals that bypass normal channels to reach decision-makers, assume that people will pull the cord when something goes wrong. Bourdieu explains why they will not: the field punishes those who disrupt the doxa. Pulling the algedonic cord means saying, publicly, that the system is not working. In an organisation where the system’s purpose is to appear to work, this is a career-limiting act. The accountability sink is the cybernetic expression of misrecognition: the system’s actual purpose is concealed from its participants by the very structures that reproduce it.

Beer’s solution is architectural: design systems where pain signals escalate automatically rather than being absorbed by process. The algedonic alert is the structural countermeasure to the accountability sink. But it only works if the culture permits it, which is why Beer’s structural model, powerful as it is, requires the conditions that Westrum’s generative culture describes.

6. Recursion: Why Decision Architecture Cannot Be Centralised

The most structurally radical feature of the VSM is recursion. Every viable system contains viable systems and is contained within a viable system. The same five-system structure applies at every level. A team is a viable system within a department within a division within a corporation. This is not hierarchy. It is nested autonomy.

The implication for the Deciding phase is profound. Decision quality cannot be designed at the top and deployed downward. A team adopting AI-assisted development is a viable system that must have its own System 1 (doing the work), System 2 (coordinating internally), System 3 (optimising its own operations), System 4 (sensing its own environment), and System 5 (maintaining its own identity in relation to the change). The programme management approach, a single AI strategy deployed uniformly, violates what Beer called the Recursive System Theorem: each viable system must develop its own viability at its own level of recursion.

This is where Beer and Evans converge most directly. Evans’s insistence that each bounded context needs its own model, its own language, its own team is the domain design expression of Beer’s recursive principle. You cannot have a single enterprise domain model for the same reason you cannot have a single enterprise decision architecture: the variety at each level of recursion is different, the environment at each level is different, and the decisions that matter at each level are different. The specification that works for the payments domain will not work for the fraud domain, because they are different viable systems operating in different environments with different variety requirements. Recursion is not a metaphor for decentralisation. It is the structural law that makes decentralisation a cybernetic necessity.

7. Cybersyn: The Test That History Interrupted

Beer was not merely a theorist. In 1971, Fernando Flores invited him to apply the VSM to the management of Chile’s nationalised economy under Salvador Allende. Project Cybersyn aimed to create a real-time information network connecting approximately 500 enterprises via telex to a central computer, processing production data through economic models, and reporting variables outside normal parameters. The critical design principle: consistent with both cybernetic theory and Allende’s political commitments, the system was designed to preserve worker and enterprise autonomy, not to implement centralised control. Each enterprise remained a viable system. The network provided coordination (System 2) and intelligence (System 4) without crushing operational autonomy (System 1).

Cybersyn reached an advanced prototype stage before the Pinochet coup destroyed it in 1973. Its legacy is double-edged. It demonstrates that Beer’s ideas were not merely academic; they were implementable at national scale. It also demonstrates that a system designed for autonomy requires a political context that values autonomy. Enterprise AI transformations face an internal version of the same lesson: the information flows the transformation requires may be incompatible with the political structures that govern the organisation. The architecture can be designed. Whether the politics will permit it is a different question, and one that Beer’s model, for all its power, cannot answer on its own.

8. Beer’s Limits: Structure Is Necessary but Not Sufficient

Beer must be read with his limitations visible. His framework is structural. It tells you what information flows must exist for an organisation to be viable. It does not tell you how to navigate the politics of who benefits from the current architecture. It does not tell you how to overcome the defensive routines that distort information regardless of the channels available. It does not model habitus, ontological security, or the emotional dynamics of change. Jackson’s critique is fair: the VSM is a unitary, functionalist model that assumes shared purpose and provides no mechanism for the democratic derivation of that purpose. The question “viable for whom?” is real, and Beer does not adequately address it.

Argyris addresses what Beer cannot: the psychological mechanisms that filter and suppress information regardless of architecture. Bourdieu explains why redesigning the architecture may reproduce the old power relations in the new structure. Heifetz names the adaptive challenge that Beer’s System 5 must hold but cannot create: the willingness to redefine identity when the environment demands it. Beer provides necessary structure. Culture, psychology, and politics provide the conditions that make the structure work.

The synthesis this series points toward: Beer gives you the architecture. Without the information flows he describes, viable decisions are structurally impossible. But the architecture is not sufficient. It must be animated by the generative culture Westrum describes, the psychological safety Edmondson identifies, and the holding environment Heifetz creates. Architecture without culture is a diagram. Culture without architecture is a wish. Viability requires both.

(An Organisational Prompt is something you can do now....)

Organisational Prompt

Map one stuck decision against the 3-4 homeostat.

Pick a decision about AI that has been circulating without resolution. Ask two questions. First: is this a System 3 decision (how do we do what we already do, better?) or a System 4 decision (should we be doing something different entirely?). Second: who is responsible for holding the tension between the two? If the answer is “a committee,” you have found your problem. Committees attenuate variety; they do not hold creative tension. Beer’s architecture requires a System 5 function: someone whose job is not to resolve the tension but to maintain it, ensuring the organisation neither optimises itself into blindness nor strategises itself into paralysis. If that function does not exist, create it. The decision will not unstick until the architecture permits it to move.

Further Reading

Stafford Beer: Brain of the Firm - The original statement of the Viable System Model, using the neurocybernetic metaphor. Dense but rewarding. This is where Beer makes the case that the structural requirements of a viable system are isomorphic regardless of substrate.

Stafford Beer: The Heart of Enterprise - The companion to Brain of the Firm. Develops the VSM from first principles using managerial rather than neuroscientific language. Contains the four principles, three axioms, and the formal statement of requisite variety as an organisational design constraint. The more rigorous of the two; read it if you want the architecture derived rather than illustrated.

Stafford Beer: Diagnosing the System for Organizations -The practical handbook. If you read only one Beer book, read this one. It walks you through the VSM as a diagnostic tool with worked examples. This is where POSIWID is formally stated.

Stafford Beer: Designing Freedom - The six Massey Lectures. Short, accessible, and passionately argued. Beer’s conviction that cybernetics is not about control but about liberty: the design of systems that maximise the freedom of their participants within the constraints of coherent purpose.

Dan Davies: The Unaccountability Machine - The best contemporary introduction to Beer’s ideas, applied to modern institutional failure. Davies’s concept of accountability sinks extends Beer’s framework into the question of why organisations that could know what is happening choose, structurally, not to know.

Eden Medina: Cybernetic Revolutionaries: Technology and Politics in Allende’s Chile - The definitive history of Project Cybersyn. Essential reading for anyone interested in the relationship between cybernetic architecture and political context. Medina shows both what Cybersyn achieved and the tensions between Beer’s systems thinking and the Chilean political reality.

Specification Driven Development and Organisational Change

Justin Arbuckle — Mon, 04 May 2026 07:01:26 GMT

Every organisation adopting AI is discovering the same thing: the bottleneck is not the technology. It is the ability to say, precisely, what you want. The developer who types a vague prompt into an AI coding assistant and receives useless code in return has not encountered a limitation of the model. They have encountered a limitation of their own clarity. The model will generate something. The question is whether that something is what was needed, and the answer depends entirely on whether anyone knew what was needed before the generation began.

This is not a new problem. It is the oldest problem in software engineering, restated with new urgency. What has changed is the cost of ambiguity. When a human developer writes code against an unclear requirement, the ambiguity is partially absorbed by the developer’s contextual knowledge, their experience with similar systems, their ability to ask clarifying questions mid-implementation. When an AI model generates code against an unclear specification, no such absorption occurs. The model generates the most statistically probable interpretation of the prompt. If the prompt is ambiguous, the output is confidently wrong. The feedback loop that human teams use to navigate ambiguity; the hallway conversation, the whiteboard sketch, the “is this what you meant?”; does not exist in the human-to-machine interface unless it is deliberately engineered.

Specification-Driven Development (SDD) is that deliberate engineering. It is the discipline of making the specification the authoritative artefact in the development process

It is not a byproduct of implementation, not documentation written after the fact, but the source of truth from which implementation, validation, testing, and documentation are derived. In the context of AI-augmented work, SDD is the mechanism by which human intent is translated into machine-executable constraint. It is, in the language of this series, the practice of clarity.

But the word “clarity” is misleading if it suggests that the practitioner begins with a clear understanding and merely transcribes it. The deeper truth, and the central argument of this article, is that clarity is not a precondition of specification. It is a product of it. You learn what you want by trying to say it precisely, seeing what the machine builds from your words, recognising the gap between your intention and your expression, and revising. The specification is not written once. It is iterated into existence, and each iteration teaches the author something they did not know they did not know.

1. The Practice of Saying What You Mean

A specification, in the SDD sense, is a structured description of what a system does: what it accepts, what it produces, and what constraints govern the boundary between them. In practice, the specification takes two forms. The human-authored form is natural language: user stories, acceptance criteria, domain constraints, and ecosystem requirements, written in markdown and versioned alongside the code. The machine-readable form; the OpenAPI contract, the JSON Schema, the test harness; is generated by the AI from the human-authored description. The human never needs to write YAML or JSON Schema. They need to describe their domain precisely enough that the AI can produce the correct technical artefacts. The distinction matters because machine-readability is what makes the validation loop possible, and the validation loop is what makes iterative learning possible.

The practice of specification is the practice of answering, at every point, questions you might defer. But here is the shift that AI-assisted development introduces: the questions that matter are no longer technical. An AI model will choose the HTTP method, the status codes, the JSON Schema syntax. It will generate the OpenAPI YAML. Those are solved problems. The questions the human must answer are domain questions: what is a sort code, and what makes one valid? What states can a payment move through, and in what order? Is a reference field mandatory, and how long can it be? Can a customer pay in euros, or only in sterling? What happens when the payment fails: does the money return immediately, or is there a holding period? These are questions that no AI model can answer from its training data, because the answers are specific to this bank, this regulatory environment, this product.

Each of these commitments is a small act of clarity about the domain. Individually, they seem trivial. Collectively, they constitute a complete, precise, testable description of system behaviour. And the act of making them forces the author to confront ambiguities that would otherwise travel silently into the implementation, surfacing as bugs, misunderstandings, and integration failures weeks or months later.

Consider a concrete example. A team at a retail bank is building an API for customer payments. In a natural language requirements document, the requirement might read: “The system should allow customers to pay someone.” This is a sentence. It tells you roughly what the system does. An AI model given the sentence will improvise.

But when the team sits down and describes the domain precisely; a payment comes from a specific account, goes to a payee identified by name, sort code, and account number, carries an amount in sterling, and includes a short reference; the AI can generate a formal specification from that domain knowledge. The team does not need to know OpenAPI syntax, HTTP methods, or the JSON Schema. They need to know that sort codes are six digits in three pairs separated by hyphens, that account numbers are exactly eight digits, that payment references cannot exceed eighteen characters (a constraint imposed by the Faster Payments network), and that a payment moves through a specific lifecycle: pending, processing, completed, failed, or returned.

The AI produces the formal specification and the implementation. But the domain constraints that make both precise came from the humans. In the Spec Kit approach, what the team actually writes looks like this:

## Feature: Create Customer Payment

### User Story
As a customer, I want to pay someone from my bank account
so that I can transfer money to people and businesses.

### Data Constraints
- **Source account**: Identified by account ID
- **Payee**: Must include name, sort code, and account number
  - Sort code: six digits in three pairs separated by hyphens (e.g. 20-30-40)
  - Account number: exactly eight digits
- **Amount**: Must be a positive number, in pounds and pence (two decimal places maximum). Currency is GBP only.
- **Reference**: Free text, maximum 18 characters (Faster Payments network limit)

### Payment Lifecycle
A payment moves through these states: pending → processing → completed.
A payment can also move to "failed" or "returned" from processing.

### Acceptance Criteria
- A valid payment request returns a payment ID, status, and timestamp
- An invalid request (missing fields, malformed sort code, negative amount) is rejected with a clear error

This is what the humans write. The AI reads it and generates an OpenAPI contract, the JSON Schema with the regex patterns, the HTTP methods, the status codes, the request and response structures. The team never touches YAML.

2. Why the First Version Is Always Wrong

This is the central claim of this article:

The most important property of a specification is not that it is correct. It is that it is revisable.

The first version of any specification will be wrong. Not because the author is incompetent, but because the act of specifying reveals gaps in understanding that were invisible before the specification forced them into the open.

Karl Weick, the organisational theorist whose work on sensemaking has appeared throughout this series, captured this with his famous formula: “How can I know what I think until I see what I say?” Weick’s insight is that understanding is retrospective. You do not first understand and then express. You express, observe what you have expressed, and then understand what you meant.

The specification is the “saying.” The AI-generated output is the “seeing.” And the revision is the understanding.

Here is what this looks like in practice, continuing with the banking payment example.

Version 1. The team describes to the AI model what they want: an endpoint that retrieves a customer’s payment history. The AI generates a specification and an implementation. The team tests it. It returns every payment the customer has ever made, going back years, in a single response. For a customer with thousands of payments, the response is megabytes of JSON and takes seconds to return.

The team never mentioned pagination because they were thinking about what information to show, not about what happens when a customer has ten years of payment history. The AI generated exactly what was described: all payments, in an array, in one response. The gap was not technical. It was a domain gap: the team had not yet thought about the scale of their own data.

The description the team gave the AI was simple:

## Feature: Payment History

### User Story
As a customer, I want to see my payment history
so that I can review past transactions.

### Data
Each payment should include: payment ID, payee name, amount,
status, date, sort code, account number, and reference.

No mention of pagination. No mention of sorting. No mention of filtering. The AI generated a working implementation from this description. The gap was not technical. It was domain knowledge the team had not yet articulated.

Version 2. The team tells the AI: “Customers can have thousands of payments. We need to return them in pages, no more than a hundred at a time.” The AI revises the specification, adding pagination parameters, response metadata, and constraints. The team regenerates. Now the response is paginated. But the team notices that payments are returned in no particular order. Some pages show recent payments mixed with payments from years ago. The team had not thought about ordering because the requirement for ordering comes from knowing how service agents actually work: they almost always start with the most recent payments.

Version 3. The team tells the AI: “Payments should be sorted by date, most recent first by default. Customer (not AI) agents also need to filter by status and search by payee name.” These are operational insights; knowledge about how the system is actually used; not technical requirements. The AI revises the specification accordingly. But when the team regenerates, they notice the response includes the full detail of every payment: payee sort code and account number, internal processing timestamps, fraud check results, and the full audit trail. This is an ecosystem problem: the team now realises that this API will be consumed by the customer-facing mobile app and by internal operations dashboards, and those channels have different data sensitivity requirements. Sort codes and fraud scores must not reach the mobile channel.

Version 4. The team tells the AI: “The list endpoint should return only a summary; payment ID, payee name, amount, status, and date. The full detail, including sort code, account number, and processing timeline, belongs on a separate detail endpoint. And we need a channel indicator so the API knows whether the caller is the customer app or an internal tool; customer-facing channels must not see fraud scores or processing metadata.” The AI revises, creating separate schemas and adding channel-based visibility rules. The team regenerates. The list is fast, the detail is comprehensive, and the API respects the ecosystem’s data sensitivity boundaries.

By Version 4, the description the team gives the AI has evolved into something substantially different from where they started:

## Feature: Payment History (v4)

### User Story
As a customer, I want to browse my payment history in manageable pages
so that I can find specific transactions without loading years of data.

As a service agent, I want to search and filter payment history
so that I can quickly locate a customer's transaction during a call.

### Pagination
- Results are returned in pages. Default page size: 20. Maximum: 100.
- Response includes: current page, page size, total pages, total items.

### Sorting
- Supported sort options: date ascending, date descending, amount ascending, amount descending
- Default sort: most recent first (date descending)

### Filtering
- Filter by payment status (pending, processing, completed, failed, returned)
- Search by payee name (partial match)

### Channel Sensitivity
- The API must know whether the caller is the customer app or an internal tool
- **Customer channel**: Show payment ID, payee name, amount, status, date only
- **Internal channel**: Additionally show sort code, account number, fraud scores,
  and processing metadata
- Fraud scores and processing metadata must never reach the customer channel

### List vs Detail
- The list endpoint returns a summary only (ID, payee name, amount, status, date)
- A separate detail endpoint returns the full payment record including sort code,
  account number, reference, and processing timeline

Every lesson the team learned; pagination, sorting, filtering, channel sensitivity, the separation of summary from detail; is now captured in the description. The AI generates the technical artefacts (OpenAPI contracts, JSON Schemas, response structures) from this. The implementation generated from this version cannot contain the mistakes that the first version’s output exhibited, because the domain and ecosystem constraints now rule them out.

Four versions. Each one taught the team something they did not know when they started. Not about the technology; about their own requirements. They did not know they needed pagination until they saw a response without it. They did not know they needed sorting until they saw unordered results. They did not know they needed filtering until they imagined the customer service agent searching for a specific payment. They did not know they needed channel-sensitive visibility until they saw sort codes and fraud scores in a response destined for the mobile app.

Now you may be thinking that typically when we go into development, details such as these are already known and specified so the example above is not representative. They are obvious constraints. This is true, but consider the broader implication - the AI is working as a partner to prompt YOU to ask the right questions. This is remarkably similar to a typical session with a business analyst. The only difference is that the distance to implementation has now shrunk to close to zero.

So, the process of iteration is the learning process working as intended. The specification is the artefact that makes the learning visible and cumulative. Each version is preserved in version control. Each change has a reason. The version history is a record of the team’s increasing understanding of their own domain.

While everything else in technology is shifting left, Learning is Shifting right. Closer to the point of delivery.

3. The Collapse of the Development Lifecycle

The traditional software development lifecycle is a sequence of phases: requirements gathering, design, implementation, testing, deployment. In practice, these phases are separated by handoffs. Business analysts write requirements and hand them to architects. Architects produce designs and hand them to developers. Business analysts write user stories or requirements and hand them to developers. Developers write code and hand it to testers. Testers find defects and hand them back to developers. Each handoff introduces delay, information loss, and the opportunity for misinterpretation. A two-week sprint contains perhaps three or four days of actual implementation, buffered by meetings, handoff ceremonies, context-switching, and the friction of translating one artefact (the requirement) into another (the design) and then into another (the code). In many organisations, developers spend less than 50% of their time actually writing code.

AI-assisted specification-driven development compresses this sequence until the phases are no longer distinct. When the team describes what they need and the AI generates both the specification and the implementation, the gap between “define what we want” and “see what we get” shrinks from days or weeks to minutes. The team describes a requirement, the AI generates a specification and implementation, the tests run, the team observes the result and revises their description; all in a single sitting. What was a multi-week cycle involving multiple handoffs between multiple roles becomes a tight loop executed by a small group in real time.

This is not merely faster. It is structurally different. In the traditional lifecycle, the feedback signal is slow and noisy. A business analyst writes a requirement in week one. A developer interprets it in week three. A tester finds a discrepancy in week five. By the time the defect report reaches the business analyst, the original context has faded. The analyst must reconstruct why they wrote the requirement the way they did. The developer must reconstruct why they interpreted it the way they did. The reconstruction is lossy. Information has decayed.

In the compressed cycle, the feedback signal is immediate and precise. The team describes a requirement at 10am. The AI generates the specification and implementation at 10:02. The tests run at 10:03. By 10:05, the team is looking at specific test failures that reveal specific gaps in their description. The context is fresh. The people who described the requirement are the same people looking at the failures. There is no handoff, no delay, no reconstruction. The gap between intention and observation is minutes, not weeks.

The constraint, and it is a binding constraint, is not the speed of generation. It is the speed of human understanding. The AI can regenerate in seconds. The team cannot rethink their domain model in seconds. Each iteration requires the humans in the room to look at the output, understand what is wrong, diagnose whether the problem is in the specification or the implementation, and decide how to revise. This thinking cannot be compressed below a certain threshold. But the elimination of all the other time; the handoffs, the context-switching, the waiting for someone else to do their part; means that the thinking is the only thing left. The development lifecycle has been compressed to its irreducible core: the time it takes humans to understand what they actually want.

In the compressed cycle, a team might execute four iterations in a morning. The payment history example from Section 2; four versions, each revealing something new about pagination, sorting, channel sensitivity, and data visibility; could be completed before lunch. The team that would have spent a quarter learning what they needed could learn it in a day.

4. Who Writes the Specification: The End of the Handoff

The collapse of the development lifecycle has a direct consequence for who does the work. In the traditional model, the roles are separated because the phases are separated. Business people define requirements. Technical people implement them. The two groups work in sequence, communicating through documents that travel between them. The business analyst writes a requirements document, emails it to the technical lead, and waits. The technical lead reads it, has questions, schedules a meeting for next week, and waits. The meeting produces partial answers and new questions. The cycle continues.

This separation was never ideal, but it was economically rational when implementation was the bottleneck. If writing the code takes weeks, there is no point having the business analyst sit beside the developer for the duration. The business analyst’s time is better spent on the next set of requirements while the developer works through the current ones. The handoff is a concession to the economics of slow implementation.

When implementation takes seconds rather than weeks, the economics reverse. The bottleneck is no longer writing the code. It is knowing what the code should do. And knowing what the code should do requires two kinds of knowledge that almost never reside in the same person:

Domain knowledge (what the business needs, what the customer expects, what the regulations require, what the edge cases look like in practice)
Ecosystem knowledge (what the downstream systems expect, what format the Faster Payments gateway requires, what data the mobile app can safely display, what the core banking platform’s rate limits are, what the fraud detection service needs to see).

The specification is the meeting point of these two kinds of knowledge.

And it cannot be produced well unless both kinds are present simultaneously. A domain expert working with the AI alone will produce something that captures the business intent but misses ecosystem constraints: they might describe a “payment type” without realising that the downstream Faster Payments gateway expects a specific ISO 20022 message format with mandatory fields that the specification must accommodate. A technical person working with the AI alone will produce something that is structurally rigorous but functionally weak: the AI will generate correct schemas and validation rules, but the specification will not capture the business rule that new payees require a 24-hour cooling-off period before the first payment is released, or that payments above £25,000 require a second authorisation step.

The practice that emerges, and that some effective teams are already adopting, resembles pair programming more than any traditional requirements process. A domain expert and a technical person sit together; literally or virtually, but synchronously; and describe the system’s behaviour in conversation, letting the AI generate the specification from their descriptions. The domain expert says: “When a customer makes a payment, we need to check they have sufficient funds.” The technical person asks: “Do we use the current balance, or do we need to account for pending payments that haven’t cleared yet? The core banking platform exposes both a current_balance and an available_balance; they diverge whenever there are pending outbound payments.” The domain expert pauses: “We use available balance, which already accounts for pending outbound payments. But there is a complication; customers have a daily payment limit, and it is different for different account types. Standard accounts have a £25,000 daily limit. Premium accounts have £100,000.” The technical person: “That means the check has to aggregate all of today’s completed and pending payments. And ‘today’ is going to be tricky; the core banking platform uses UTC, but the customer-facing daily limit resets at midnight UK time. We need to be clear about which timezone governs the boundary, or payments made between midnight UTC and midnight GMT will be calculated against the wrong day’s total.” The domain expert: “UK time. And there is another thing. If the payee has never been paid before, the first payment is held for 24 hours before processing. It is a fraud prevention measure.”

This conversation is producing a specification. Not a requirements document to be interpreted later, but a description precise enough that the AI can generate a formal specification and test it immediately. The domain expert provides the business rules and the “why.” The technical person provides the ecosystem awareness: what the downstream systems expose, where the integration boundaries create complications, what the platform constraints are. Neither could produce the specification alone. The domain expert does not know that the core banking platform uses UTC while the business rule operates on UK time. The technical person does not know that new-payee payments have a cooling-off hold.

The specification that emerges from this conversation captures each business rule in natural language precise enough for the AI to generate machine-enforceable constraints:

### Business Rules: Payment Validation

**Rule 1: Sufficient Funds**
The payment amount must not exceed the available balance on the source account.
Available balance already accounts for pending outbound payments; do not use
the current balance, which does not.
If the payment exceeds the available balance, reject it with a clear error
showing both the requested amount and the available balance.

**Rule 2: Daily Payment Limit**
The sum of today's completed and pending payments, plus this new payment,
must not exceed the account's daily limit.
- Standard accounts: £25,000 per day
- Premium accounts: £100,000 per day
"Today" is determined by UK time (GMT/BST), not UTC. The core banking
platform uses UTC internally, so the boundary must be converted.

**Rule 3: New Payee Cooling-Off**
If the payee has never previously received a payment from this customer,
the first payment is held for 24 hours before processing.
This is a fraud prevention measure. The payment is accepted but with
status "held". It is released automatically 24 hours after the payee
was added to the customer's payee list.

### Ecosystem Context
- The source account exposes two balance fields: `current_balance` and
  `available_balance`. Use `available_balance` for this check.
- The account object includes `daily_limit` and `account_type` (standard or premium).

Neither person could produce this alone. The domain expert contributed the business rules: sufficient funds, daily limits by account type, cooling-off periods for new payees. The technical person contributed the ecosystem knowledge: the distinction between current and available balance in the core banking platform, the UTC-versus-UK-time timezone boundary that determines when the daily limit resets, the fact that available_balance already accounts for pending outbound payments. The AI generates the technical artefacts; the validation logic, the error response contracts, the account schema; from this combined description. The specification is the meeting point of domain and ecosystem, not of business and YAML.

The pairing model works because the compressed lifecycle makes the feedback loop tight enough that both participants stay engaged. In the traditional handoff model, the domain expert writes the requirement and moves on; by the time the questions come back, they are thinking about something else. In the pairing model, the questions arise in real time, are answered in real time, and are immediately fed to the AI for specification generation. The domain expert sees the output take shape and can correct misunderstandings before they propagate. The technical person hears the business reasoning and can raise ecosystem constraints that the domain expert would never have considered. The specification that emerges is better than either person could produce alone, and it is produced in a fraction of the time that the traditional handoff process would require.

This has organisational implications that most enterprises have not yet confronted. The separation of business and technology into distinct departments, with distinct reporting lines, distinct planning cycles, and distinct physical locations, was rational when the work was sequential. When the work becomes simultaneous, the separation becomes an obstacle. The domain expert and the technical person need to be available to each other in the same moment, not the same week. The organisations that will move fastest in AI-driven development are those that can form these pairs; or small groups of three or four (think Amazon’s two-pizza teams), for more complex domains; and give them the authority and the time to iterate specifications together without waiting for approval from parallel governance processes.

5. The Redistribution of Roles

The pairing model described in Section 4 is the transitional state, not the final one. As AI-assisted specification tools mature, three traditional roles; the business analyst, the software engineer, and the architect; are being fundamentally reshaped. The redistribution is not a minor adjustment. It is a structural change to how enterprises organise the work of building systems.

The business analyst becomes the specification author. The argument of this article has been that domain knowledge is the bottleneck. The person who knows the domain best is typically the business analyst: they understand the business rules, the regulatory constraints, the customer journeys, and the edge cases that arise in practice. In the transitional state, they pair with an engineer because the tools require some technical fluency to operate. But specification tools are rapidly eliminating that requirement. When the human-authored artefact is natural language markdown; user stories, acceptance criteria, domain constraints, and ecosystem context; the person who can write it most accurately is the person who knows the domain, not the person who knows the technology. The business analyst who today pairs with an engineer to produce specifications will increasingly work with AI directly, describing their domain in structured natural language and reviewing the generated technical artefacts for domain accuracy. They do not need to understand the OpenAPI contract the AI generates. They need to verify that the contract captures the right business rules. This is knowledge they already have.

The implication is significant: the business analyst’s role shifts from writing requirements documents for others to interpret into authoring specifications that the AI acts on directly. The specification is no longer a communication artefact between humans. It is an instruction set for machines. The quality bar changes accordingly. A requirements document that says “the system should handle edge cases appropriately” is acceptable when a human developer will use their judgement to decide what “appropriately” means. A specification that says the same thing will produce AI-generated code that handles no edge cases at all, because “appropriately” is not a constraint the model can act on. The business analyst must learn to be precise, not technical. Precision about domain rules is a different skill from technical fluency, and it is a skill that domain experts can develop faster than engineers can develop domain expertise.

The engineer becomes the integration architect. If AI generates the implementation from specifications, and the business analyst authors the specifications, what does the engineer do? The answer is not “nothing.” It is “something fundamentally different.” The engineer’s role shifts from writing application code to engineering the contracts that enable communication between AI-generated system components.

In an enterprise with dozens of teams, each producing AI-generated services from their own specifications, the critical challenge is not what happens inside each service. It is what happens between them.

Do the services share a common semantic model for core concepts like “customer,” “account,” and “transaction”? Do they apply consistent security policies? Do they handle errors in compatible ways? Do they version their contracts so that changes to one service do not break its consumers? Do they share authentication and authorisation patterns? These are engineering problems, but they are problems of the integration fabric, not of the application logic.

This is the role that Eric Evans, in Domain-Driven Design, identified as context mapping: understanding how different bounded contexts relate to each other and managing the translations between them. When the payments team’s specification refers to a “customer” and the accounts team’s specification also refers to a “customer,” are they the same concept? Often they are not: the payments context cares about payment limits and payee lists; the accounts context cares about balances and interest rates. The engineer’s job is to make these relationships explicit, to define the contracts at the boundaries, and to ensure that the ecosystem of specifications is semantically coherent even when individual teams work independently within their own domains. Domain-Driven Design, and its application to specification-driven development, will be explored in depth in a future article in this series.

The architect becomes the domain decomposer. If the business analyst authors specifications within a domain and the engineer maintains the contracts between domains, someone must decide where the domain boundaries are. This is the architect’s role, and it is arguably more consequential than the traditional architectural function of selecting technologies and designing solutions. The architect’s primary task becomes breaking the enterprise down into functional domains and bounded contexts: self-contained areas of business capability, each with its own ubiquitous language, its own specification surface, and its own team. The payments domain. The customer onboarding domain. The fraud detection domain. The regulatory reporting domain. Each domain becomes a bounded context in which a team can develop autonomously, guided by the specifications they author within that context.

Matthew Skelton and Manuel Pais, in Team Topologies, provide the organisational counterpart to this architectural decomposition. Their model describes four fundamental team types: stream-aligned teams that own a domain and deliver value within it; platform teams that provide shared capabilities (including, in this context, shared specification standards, contract testing infrastructure, and security policy templates); enabling teams that help other teams develop new capabilities (such as specification maturity); and complicated-subsystem teams that own technically complex components requiring specialist knowledge. The architect’s decomposition of the enterprise into domains and bounded contexts directly determines how stream-aligned teams are formed and what they own. A poorly drawn boundary forces a team to coordinate across two domains simultaneously; a well-drawn boundary gives the team autonomy within a coherent scope. Team Topologies, and its implications for specification-driven organisations, will also be the subject of a future article in this series. Again, Amazon got there first with the Bezos API Mandate.

The redistribution of roles constitutes a major change to how enterprises develop software. The business analyst moves from the periphery of the development process to the centre: they are the person whose knowledge is the binding constraint, and the tools are increasingly shaped to serve them directly. The engineer moves from writing application code to maintaining the integration fabric that holds the enterprise’s ecosystem of specifications together. The architect moves from designing solutions to decomposing problems: defining the boundaries within which teams and their specifications operate. None of these roles disappears. Each is transformed. And the transformation is driven by the same underlying shift: when AI can generate implementations from specifications, the human work moves to the places where human judgement is irreplaceable; domain knowledge, semantic coherence, and structural decomposition.

6. The Validation Loop: Where the Learning Signal Lives

The mechanism that makes iterative specification productive, rather than merely repetitive, is the validation loop. Without validation, iteration is just guessing with extra steps. With validation, each iteration produces structured information about the gap between intent and expression.

The loop has five steps.

The specification encodes domain constraints. The business rules, ecosystem boundaries, and operational requirements that the team described have been formalised by the AI into types, required fields, value ranges, enumerated values, and structural relationships. These are not aspirational. They are mechanical: a validator can check each one with a binary pass or fail.
The AI generates a candidate output. Code, data, or a structured action that attempts to satisfy the specification.
A validator checks conformance. Does the output match the specification? This check can be simple (does the JSON structure match the schema?) or complex (does the generated code pass a test suite derived from the specification?).
On failure, the validation errors are fed back. This is the critical step. The error is not “try again.” It is specific, structured feedback: “field ‘amount.value’ expected a positive number, received -50.00” or “test case ‘payment to new payee within cooling-off period’ expected held status, received processing.” The model uses this feedback to produce a targeted correction rather than regenerating from scratch.
On success, the output is guaranteed conformant. It can be deployed, integrated, or passed to the next stage.

This loop can execute automatically, iterating until valid output is produced or a retry limit is reached. But its deeper significance is not automation. It is the creation of a learning signal. Every validation failure is information about what the specification constrains, what the model misunderstood, and, crucially, what the specification failed to constrain.

This last category is where the real learning happens. When the AI generates output that is structurally valid but semantically wrong; the JSON is well-formed but the business logic is backwards; the automated validation passes but the human reviewer catches the problem. The response is not to blame the model. It is to revise the specification: to add the constraint that was missing, to tighten the schema, to make explicit what was previously assumed.

This revision is the double-loop learning that Chris Argyris argued organisations must practise if they are to learn. Single-loop learning corrects the implementation within existing assumptions: the code is wrong, fix the code. Double-loop learning questions the assumptions themselves: the specification is incomplete, revise the specification. The specification is where the assumptions live. You cannot question assumptions you cannot see. You cannot see them until they are written down as commitments in a formal artefact.

7. Tests as the Evidence Chain

In the compressed development cycle, tests occupy a fundamentally different position than in traditional software engineering. In the traditional model, tests are written after the implementation, often by a separate team, and they validate that the code does what the developer intended. The tests are coupled to the implementation. When the implementation changes, the tests must change with it. The tests tell you whether the code works. They do not tell you whether the code is right; whether it satisfies the original requirement.

In specification-driven development, the tests are derived from the specification, not from the implementation. This inversion changes everything. A test derived from the specification is not asking “does the code do what the developer intended?” It is asking “does the code do what the specification says?” And because the specification is the authoritative definition of system behaviour, a test that passes is evidence of conformance to the contract.

This creates a traceable chain from business intent through specification to test to implementation:

The domain expert says: “A customer cannot make a payment that exceeds their available balance.” This intent is encoded in the specification as a validation rule on the payment endpoint: the endpoint must reject requests where the payment amount exceeds the customer’s available balance, returning a 422 Unprocessable Entity with an error message identifying the shortfall. From this specification clause, a test is derived:

Scenario: Payment exceeds available balance
  Given a customer with an available balance of £1,200
  When the customer creates a payment for £1,500
  Then the payment is rejected with a 422 response
  And the response body contains "exceeds available balance"
  And the response body contains the available balance of £1,200

This test is traceable. It points to a specific clause in the specification. The specification clause points to a specific business rule articulated by the domain expert. If the test fails, the team knows not just that something is broken but which business rule is violated. If the specification changes (perhaps the business decides to allow an arranged overdraft to cover the shortfall), the test is updated to match, and the link between business rule, specification, and test is preserved.

Now consider what this traceability enables when the AI regenerates the implementation. In the traditional model, regenerating the implementation would be terrifying: you would need to re-test everything manually to confirm that the new code still satisfies all requirements. In the specification-driven model, the test suite is the specification expressed as executable assertions. Regenerate the implementation, run the test suite, and every passing test is evidence that the new implementation satisfies the corresponding specification clause. Every failing test is a precise signal: this specific business rule, encoded in this specific specification clause, is not satisfied by the new implementation.

The test suite becomes the bridge that makes rapid iteration safe. The team can revise the specification and regenerate with confidence because the tests will catch any regression. The tests are not coupled to the implementation (they do not care how the code works, only that it produces the right outputs for the right inputs), so a completely different implementation; different algorithms, different variable names, different internal structure; will pass the same tests as long as it satisfies the same specification. This decoupling is what makes AI regeneration practical. The AI might generate a completely different implementation each time. The tests do not care. They check the contract, not the code.

The traceability chain also serves a governance function.

When an auditor, a regulator, or a security reviewer asks “how do you know this system enforces payment limits?”, the answer is not “we wrote the code carefully.” The answer is: here is the business rule, here is the specification clause that encodes it, here is the test that verifies it, here is the test result from the most recent deployment, and here is the version history showing when the rule was introduced and how it has evolved.

The entire chain is documented, versioned, and mechanically verified. This is a level of traceability that most organisations aspire to and few achieve, because in the traditional model it requires enormous manual discipline to maintain the links between requirements, design, implementation, and tests. In the specification-driven model, the traceability is structural: the tests are derived from the specification, so the link cannot break.

Returning to the payment history example from Section 2, each version of the specification should have produced a corresponding evolution in the test suite:

Version 1’s tests verified that the endpoint returned payment objects with the correct schema. When the team discovered the pagination problem, the failure was not a test failure; it was a performance observation. This is the signal that the test suite was incomplete: no test had asserted pagination behaviour because the specification had not defined it.

Version 2 added pagination to the specification, and new tests were derived: a test that requests page 1 and verifies the pagination metadata, a test that requests a page beyond the total and verifies an empty result or appropriate error, a test that sets per_page to 200 and verifies that the maximum of 100 is enforced.

Version 3 added sorting and filtering, and new tests followed: a test that requests created_at_desc and verifies that the first payment has the most recent timestamp, a test that filters by status=completed and verifies that no other statuses appear in the results, a test that searches by payee_name and verifies that only matching payments are returned.

Version 4 separated the list and detail schemas and introduced channel sensitivity, and the tests diverged accordingly: list endpoint tests verify that the response contains only summary fields (id, payee_name, amount, status, created_at), while detail endpoint tests verify the full schema including sort code, account number, and reference. A new category of tests verifies channel visibility: a request from the customer-facing channel must not receive fraud scores or processing metadata, while a request from the internal channel must.

Each version of the specification produces a corresponding version of the test suite. The specification and the tests evolve together, and the tests are the mechanism by which the team knows that the new version still satisfies all the commitments of the previous versions while adding the new ones. Without this co-evolution, iteration is reckless: you might fix one problem while introducing three others. With it, iteration is disciplined: the test suite is the accumulating body of evidence that the system does what the specification says it does.

8. How Specifications Constrain AI Generation

Understanding the mechanism by which AI models use specifications explains both what specifications can guarantee and what they cannot.

A Large Language Model generates text by predicting the next token (a word or word-fragment) based on everything that came before it. The model calculates a probability for every token in its vocabulary and samples from that distribution. This is why LLM output is non-deterministic: the sampling process involves controlled randomness. When a human developer writes code, the output is broadly deterministic; the same developer, given the same task, will write more or less the same code. When an AI generates code, the output varies. Run the same prompt twice and you might get different field names, different data structures, different error handling patterns.

This variation is manageable when a human reviews every line. It becomes a serious engineering problem when AI-generated components must interoperate, when the output is part of an automated pipeline, or when the volume of generation exceeds what human review can sustain. If each generation might produce slightly different field names, different data types, or different error conventions, then integrating the outputs becomes a game of whack-a-mole.

Specifications solve this through a mechanism called constrained decoding. When the model is given a schema alongside the prompt, the token selection at each step is filtered: only tokens that would produce output consistent with the schema are eligible.

If the schema says the next field must be called customer_id and must be a string, tokens that would produce a different field name or a non-string value are excluded. The model retains its creative capacity; it can choose what value to produce. But it cannot violate the structure. The schema itself is generated by the AI from the natural language specification; the humans described the domain constraints, and the AI produced the JSON Schema that enforces them.

The practical difference is immediate. Without a schema, you ask the model to generate a customer payment and it might return customerName instead of from_account_id, sortcode instead of sort_code, and amounts as strings with currency symbols instead of a structured object with separate value and currency fields. Each of those choices is plausible. Each of them breaks any downstream system expecting the contract generated from the specification in Section 1.

Without a schema constraint, the model might generate:

{
  "customerName": "Jane Smith",
  "sortcode": "20-30-40",
  "accountNo": "12345678",
  "amount": "£250.00",
  "paymentRef": "Rent - February"
}

Every field name is different from the contract. The amount is a string with a currency symbol. There is no payee nesting. This output would fail silently in any system expecting the schema that the AI generated from the Section 1 specification.

With that generated schema applied as a constraint, the model is forced to produce:

{
  "from_account_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "payee": {
    "name": "Jane Smith",
    "sort_code": "20-30-40",
    "account_number": "12345678"
  },
  "amount": {
    "value": 250.00,
    "currency": "GBP"
  },
  "reference": "Rent - February"
}

The field names, types, nesting, and structure are mechanically enforced. The model fills in the values; the schema guarantees the shape.

This matters for the iterative learning cycle because it separates two kinds of problems.

Structural problems (wrong field names, missing required fields, invalid types) are eliminated by the schema constraint. You never need to iterate the specification to fix structural errors, because they cannot occur.
Content problems (an account ID that does not correspond to a real account, a daily limit check applied backwards, a payment status transition that violates the state machine) remain, and these are the problems that drive meaningful specification iteration. They are also the problems that the test suite catches: a structural check confirms the shape of the data, but only a test derived from the specification can confirm that the business logic is correct.

9. The Specification as Organisational Memory

The version history of a specification is itself a knowledge asset, and many organisations do not realise this.

A team that has iterated through twelve versions of an API specification has encoded, in those twelve versions, everything they learned about the domain. Version 3 added pagination because the team learned about scale. Version 5 added rate limiting because the team learned about denial-of-service risks to the payments infrastructure. Version 8 separated the customer-facing and internal schemas because the team learned that fraud check metadata must not be exposed to mobile channels. Version 11 added webhook notifications for payment status changes because the team learned that polling was creating unnecessary load on the core banking platform. Each version change is a lesson learned. Each commit message, if the team writes good ones, is an explanation of what the team discovered and why they changed their mind.

Compare this to the alternative: an API with no specification, where the code is the only source of truth. A new team member reads the code and can see what it does. They cannot see why it does it that way. The rate limiting logic is there, but nothing explains what attack pattern triggered it. The channel-sensitive visibility rules are there, but nothing explains which regulatory requirement drove them. The knowledge is in the code, but the learning is invisible. When the original team members leave, the learning leaves with them.

A specification treated as a living, versioned artefact is an antidote to this knowledge loss. The specification changelog becomes the team’s institutional memory. And the test suite is that memory made executable: not just a record of what the team decided but a mechanism that enforces the decision continuously. The test for the daily payment limit does not merely document that the team decided to add the limit; it verifies, on every deployment, that the limit is still in effect. Institutional memory that is mechanically enforced does not decay the way documentation does.

Peter Drucker identified the defining challenge of knowledge work as the requirement that the worker must define the task before they can do it. But Drucker did not say the definition would be right the first time. He said the definition was the task. The specification is not a preliminary to the work. It is the work. And the version history, together with the test suite that enforces it, is the evidence that the work was done.

10. Governing AI Agents Through Specification

For agentic AI systems, where AI operates with increasing autonomy, the specification takes on a governance role that extends beyond development into operational control.

An autonomous agent’s capabilities are defined entirely by its tool specifications. Each tool specification describes what the tool does, what parameters it accepts, and what it returns. The specification defines the boundary of the agent’s action space. Add a tool specification, and the agent gains a new capability. Remove one, and the capability disappears. Tighten a constraint (restrict a database query tool to only the reporting database, rather than all databases), and the agent’s freedom contracts. Loosen a constraint (add a new database to the permitted list), and it expands.

This means that specification authoring is, increasingly, the activity through which humans govern AI behaviour. The quality of the governance is determined by the quality of the specification. A specification that is too loose gives the agent too much freedom: it might query sensitive databases, invoke expensive operations without limit, or take actions that are technically permitted but commercially inappropriate. A specification that is too tight prevents the agent from doing useful work: it cannot look up the information it needs to answer a customer’s question, or it cannot complete a transaction without human intervention at every step.

For example, a customer service AI agent might be given a tool specification for looking up payment status. The governance constraints are expressed in natural language:

### Tool: Look Up Payment Status

**Purpose**: Allow the customer service agent to check the current status of a payment.

**Input**: The payment ID.

**Channel restriction**: This tool operates in the customer channel only.
The agent cannot request internal-channel data, regardless of what the
customer asks.

**Visible fields**: payment ID, status, payee name, amount, date.

**Excluded fields**: fraud scores, sort codes, account numbers, processing
metadata, internal timestamps. These must never appear in the agent's
response to a customer.

**Status values**: pending, processing, completed, failed, returned.

The AI generates the MCP tool definition from this description: the channel field is hardcoded to "customer" in the enum, the output schema excludes the prohibited fields, and the status values are constrained to the five-value enum. The constraint is structural, not advisory. The agent does not need to be told “do not show fraud data”; the generated specification makes it impossible.

Finding the right level of constraint is the same iterative design problem that applies to API specification, applied to higher stakes. Deploy the agent with a specification. Observe what it does. Identify where the constraints are too loose or too tight. Revise the specification. Redeploy. Each iteration is a cycle of learning about the boundary between useful autonomy and dangerous freedom. The practice is the same. The consequences of getting it wrong are more immediate. And the need for tests is even more acute: if the agent’s tool specification says it can only query the reporting database, a test must verify that attempts to query other databases are rejected. The test is the proof that the governance constraint is enforced, not merely declared.

Open standards have emerged or matured to make this governance interoperable. Anthropic’s Model Context Protocol (MCP) defines a standard way for AI assistants to discover and invoke tools through specification-defined interfaces. Standards like OpenAPI for REST APIs, AsyncAPI for event-driven systems, and JSON Schema as the foundational constraint language enable AI-developed applications to interact with each other reliably, regardless of which team or which model built them. The standards do not replace the practice of specification. They make the practice portable: a specification written in an open standard is an artefact that any AI system, any team, and any tool can consume.

11. Specifications Across Boundaries

Everything described so far has assumed a single team working on a single specification. In practice, no specification exists in isolation. The payments team’s specification depends on the accounts team’s specification for balance information, the fraud team’s specification for risk assessment, the notifications team’s specification for alerting customers to payment status changes, and the regulatory reporting team’s specification for audit trail requirements. An enterprise is not a collection of independent specifications. It is an ecosystem of interdependent ones, and the quality of the ecosystem is determined not by the quality of any individual specification but by the coherence of the boundaries between them.

This is where the architectural decomposition described in Section 5 meets the practice of specification. The architect’s task of breaking the enterprise into bounded contexts is, in specification-driven development, the task of defining where one team’s specification ends and another’s begins. Each bounded context, in Eric Evans’s terminology, is an area within which a single model applies: a consistent set of terms, rules, and relationships. Within the payments context, “transaction” means a payment from one account to another. Within the trading context, “transaction” means the purchase or sale of a financial instrument. The word is the same. The meaning is different. The specification is what makes the meaning explicit and enforceable within each context.

The relationships between bounded contexts are themselves specification problems. Evans describes several patterns for these relationships. In a conformist relationship, one team adopts the upstream team’s specification wholesale: the notifications team accepts the payments team’s event schema as-is and builds around it. In a shared kernel, two teams co-own a subset of the specification: both the payments and accounts teams agree on a shared definition of “account summary” that appears in both their contracts. In an anti-corruption layer, a team translates between its own model and an upstream team’s incompatible one: the regulatory reporting team does not adopt the payments team’s domain model directly but instead maintains a translation specification that maps payment events into the regulatory format. Each of these patterns is a different kind of specification relationship, and each requires different governance.

The engineer’s role described in Section 5 finds its concrete expression here. Maintaining the semantic coherence of the integration fabric means ensuring that when the payments specification refers to account_id, it means the same thing as when the accounts specification exposes account_id; same format, same identifier scheme, same lifecycle assumptions. It means ensuring that security policies are consistent: if the accounts team’s specification requires OAuth2 bearer tokens with specific scopes, the payments team’s specification must use the same authentication mechanism when calling the accounts API. It means ensuring that error conventions are compatible: if one team returns errors in RFC 7807 Problem Details format and another returns errors as plain text messages, the integration between them will be fragile regardless of how well each individual specification is written.

Contract testing frameworks like Pact provide the mechanical verification that specifications remain compatible across boundaries. Each consuming team defines the subset of the provider’s specification that it depends on: “I call the accounts API and expect to receive an account ID, available balance, and account type.” The provider runs these consumer contracts as part of its own test suite, ensuring that any change to its specification is checked against every consumer’s expectations before deployment. This is the cross-boundary equivalent of the test suite within a single specification: it makes the dependencies visible, the compatibility verifiable, and the regression catchable. Without contract testing, changing a specification becomes a coordination nightmare as the consumer count grows. With it, the dependencies are managed mechanically rather than through meetings and email chains.

Skelton and Pais’s Team Topologies model provides the organisational structure for managing this ecosystem. Stream-aligned teams own the specifications within their bounded context: the payments team owns the payment creation spec, the payment history spec, the payee management spec, and the associated test suites. Platform teams provide the shared infrastructure that makes specification-driven development consistent across the enterprise: the contract testing pipeline, the specification linting rules, the shared authentication and error-handling standards, the API gateway configuration that enforces rate limits and channel-based access controls. Enabling teams help stream-aligned teams that are new to specification-driven development build the practice: pairing with them through their first few iteration cycles, reviewing their specification diffs, helping them establish the domain-expert-plus-engineer pairing model from Section 4. The interaction modes between teams; collaboration for closely coupled work, X-as-a-service for stable interfaces, facilitating for capability building; map directly to the specification relationships between their bounded contexts.

The ecosystem of specifications, managed through these structures, becomes the enterprise’s executable architecture. Traditional enterprise architecture is documented in PowerPoint slides and Visio diagrams that describe how systems should relate to each other. Specification-driven architecture is documented in versioned, testable artefacts that describe how systems actually relate to each other, verified on every deployment by contract tests that confirm compatibility. The architecture does not drift from reality because the specifications are reality: they are the artefacts from which the implementations are generated and against which the implementations are tested. When the architecture needs to change; a new bounded context is carved out, a service is decomposed, a shared kernel is introduced; the change is made in the specifications and the contract tests confirm that the change is safe before any code is regenerated.

This is a fundamentally different model of enterprise development. The traditional model separates planning from execution: architects plan the target state, programme managers coordinate the transition, and delivery teams implement against the plan. The specification-driven model collapses the separation: the specification is the plan, the implementation is generated from it, and the contract tests verify that the ecosystem remains coherent as individual teams iterate within their bounded contexts. The coordination overhead that enterprises spend billions on; integration testing phases, release trains, change advisory boards reviewing deployment requests; is replaced by mechanical verification at the specification boundary. Not eliminated entirely: the human judgement about where to draw boundaries, what shared standards to enforce, and how to manage breaking changes remains. But the mechanical coordination is handled by the specifications and their tests, freeing human attention for the decisions that require it.

12. The Industrialisation of SDD: From Practice to Ecosystem

The practice described in this article is not theoretical. It is being industrialised. In 2025, a convergence of tooling emerged that moves specification-driven development from a methodology that teams must implement themselves to an ecosystem with dedicated infrastructure. Three developments are particularly significant, each addressing a different layer of the problem.

GitHub Spec Kit provides the process layer. Released as an open-source toolkit in September 2025, Spec Kit formalises the SDD workflow into a CLI and a set of structured commands that work with multiple AI coding agents: GitHub Copilot, Claude Code, Gemini CLI, Cursor, and others. The workflow follows a phased sequence: /specify captures what the project should do and why, producing a spec.md. /plan translates that intent into a technical approach, recording architectural choices and dependencies in plan.md. /tasks breaks the plan into small, self-contained units of work, each with enough context for an AI agent to implement. Every change is version-controlled, so there is a visible trail from original intent through to resulting implementation.

Spec Kit introduces a concept it calls the constitution: a document that establishes non-negotiable principles for the project before any specification work begins. An organisation might define that all applications must be CLI-first, that a specific testing approach is mandatory, or that certain security patterns must be followed. The constitution operates as a constraint on the specification itself: the specification must satisfy the constitution, just as the implementation must satisfy the specification. This creates a three-layer traceability chain: organisational principles (constitution) —> feature definition (specification) —> technical plan —> implementation tasks —> code. The chain echoes the argument from Section 7: traceability is structural, not documentary. The links are maintained by the tooling, not by human discipline alone.

AWS Kiro addresses the IDE layer, embedding specification-driven development directly into the development environment rather than treating it as a separate process. Kiro is a VS Code fork that enforces a structured workflow: requirements first, then design, then implementation tasks. It generates user stories with acceptance criteria written in EARS (Easy Approach to Requirements Syntax) notation, a structured natural-language format originally developed at Rolls-Royce for airworthiness certification. EARS uses a small set of keywords; When, While, Where, If/Then; to constrain requirements into patterns that are precise enough to test against but readable enough for non-technical stakeholders:

WHEN the customer creates a new payment
  AND the payment amount exceeds the available balance
THEN the system SHALL reject the payment with a 422 response
  AND the response SHALL include the available balance

The significance of Kiro is not just the tooling but the positioning. AWS explicitly contrasts spec-driven development with what is often called “vibe coding”: the practice of throwing prompts at an AI model and accepting whatever it produces. Kiro’s thesis is that the specification is not overhead that slows down development; it is the mechanism that makes AI-generated code maintainable, reviewable, and trustworthy. The pairing model described in Section 4 finds its tool support here: Kiro’s spec workflow creates a shared artefact that both domain experts and technical staff can review and revise before any code is generated. The tool also introduces “hooks,” event-driven agents that trigger automatically when files change: save a React component and the tests update; modify an API endpoint and the documentation regenerates. This automation closes the gap between specification revision and implementation update that, in the manual practice, requires discipline to maintain.

JUXT’s Allium operates at the deepest layer: the specification language itself. Where Spec Kit provides process and Kiro provides environment, Allium provides a purpose-built behavioural specification language designed for LLM consumption. Allium addresses a problem that structural schemas cannot: the distinction between what the code does and what it should do. Code captures implementation, including bugs and expedient decisions. An LLM navigating a codebase treats all of it as intended behaviour. A specification written in markdown can capture intent, but markdown provides no framework for surfacing ambiguities and contradictions. You can write “users must be authenticated” in one section and “guest checkout is supported” in another without the format highlighting the tension.

Allium’s formal syntax makes contradictions visible. Its language describes events with their preconditions and resulting outcomes, deliberately excluding implementation details like database schemas and API designs. The specification operates purely at the level of observable behaviour. Two processes feed its evolution: elicitation works forward from intent through structured conversations with stakeholders; distillation works backward from existing implementation to capture what the system actually does, including behaviours that were never explicitly decided. When these two directions diverge, the divergence is information: either the implementation drifted from intent, or the specification was naive. Either might need to change.

The practical demonstration is striking. In a published case study, a JUXT engineer used Allium specifications to direct Claude in building a distributed system with Byzantine fault tolerance, strong consistency, and crash recovery. Three thousand lines of behavioural specification produced approximately 5,500 lines of production Kotlin and 5,000 lines of tests over a weekend. The specification evolved alongside the code across 64 commits; when load testing revealed that a component’s watermark advancement needed rethinking, the specification was revised first and the implementation followed. The specification was the site of design thinking. The code was its expression.

These three tools represent different bets on the same underlying thesis: that specification is the bottleneck, and that tooling the specification practice is more valuable than tooling the implementation. Spec Kit bets on process standardisation across agents. Kiro bets on IDE integration that makes specification the default workflow. Allium bets on a purpose-built language that captures intent at a level of precision that natural language and structural schemas cannot reach. They are not competitors. They are complementary layers of an emerging stack. A team might use EARS notation in Kiro to capture requirements, Spec Kit’s phased workflow to manage the specification-to-implementation pipeline, Allium to express the behavioural constraints that structural schemas cannot capture, and the open standards from Section 10 (OpenAPI, JSON Schema, MCP) to ensure the resulting artefacts are interoperable.

The existence of this ecosystem is itself evidence of the argument made throughout this article. If specification were merely documentation, there would be no market for these tools. They exist because teams have discovered, through painful experience, that the quality of AI-generated output is bounded by the quality of the specification that governs it. The tools do not replace the practice. They make the practice accessible to teams that do not have the expertise or the discipline to implement it from scratch.

13. The Corruption Risk: When Specification Becomes Compliance

Drucker identified the risk decades before AI existed. He invented Management by Objectives (MBO) with a specific intent: to enable decentralisation and autonomy. If people understood the objectives clearly, they could determine for themselves how to achieve them. MBO was designed to replace command-and-control with trust-and-clarity. What happened in practice was the opposite. MBO was corrupted into top-down quotas, cascaded KPIs, and surveillance. The objectives were imposed, not negotiated. The autonomy that was supposed to follow from clear objectives was never granted.

SDD faces the same corruption risk. A specification, properly understood, defines what the system should do, leaving the how to the AI or the developer. In some way, this is MBO applied to software. If specifications become rigid templates imposed by a governance function, if they are evaluated by volume rather than quality, if writing them becomes a compliance exercise detached from the team’s actual understanding of the domain, then the templates will be completed and the learning will not occur.

The pairing model described in Section 4 is itself a safeguard against this corruption. When the domain expert and the technical person describe the system together and let the AI generate the specification, the specification cannot easily become disconnected from business reality, because the person who understands the business reality is in the room. When the specification is generated by a technical person alone from a requirements document handed over a wall, the disconnection is almost guaranteed.

The version history will reveal which pattern is operating. In a genuinely iterative team, the early versions of the specification will show substantial revisions: entire sections rewritten, new concepts introduced, fields removed after the team realises they were unnecessary. In a compliance-driven team, the versions will show minor adjustments: formatting fixes, field descriptions added to satisfy a linter, boilerplate sections copied from a template. The shape of the revision history is the diagnostic.

G.E.M. Anscombe, the philosopher (covered elsewhere in this series), provides a test. A developer who writes a specification because they understand what they are building, and can answer “Why?” at each level with reasons that connect to the purpose of the system, is acting with what Anscombe calls practical knowledge. A developer who fills in a specification template because the governance framework requires it is also acting intentionally; but their intention is to comply with the process, not to build the right thing. The form is identical. The intention is entirely different. One specification will evolve through genuine learning. The other will remain static.

The same test applies to the test suite. Ask the person who wrote the tests: “Why does this test exist?” If the answer traces to a specific business rule through a specific specification clause, the tests are an evidence chain. If the answer is “the coverage tool says we need 80%,” the tests are theatre.

14. The Limits of What Can Be Specified

Intellectual honesty requires naming what specifications cannot do, even when practised iteratively.

The specification-intent gap. A specification can be syntactically valid, semantically consistent, version-controlled, and fully tested, and still miss the point. The API might do exactly what the specification describes and still not solve the customer’s problem. Formal correctness is not the same as rightness. The judgement of whether the right thing is being built remains, irreducibly, a human capability. This is why the pairing model matters: the domain expert in the room is the ongoing check against building the wrong thing precisely.
Specification expressiveness. Some constraints are straightforward to describe in natural language but difficult for the AI to express in structural schema languages. Cross-field dependencies (”if payment status is ‘completed’, then cleared_at timestamp is required”) can be generated but the resulting schema syntax is unwieldy. More complex business rules (”daily payment limit is £25,000 for standard accounts, £100,000 for premium”) and temporal constraints (”new payee cooling-off period is 24 hours from payee creation”) push beyond what structural schemas handle natively. This is why the natural language specification matters even after the AI has generated the technical artefacts: behavioural specifications in Given-When-Then format, property-based tests, and business rule engines supplement structural schemas for these cases. The practice of specification is broader than any single schema language; the test suite can verify constraints that the schema cannot express.
Over-specification. Every constraint is a hypothesis: “I believe these are all the valid values.” Hypotheses can be wrong in both directions. Too few constraints allow invalid output. Too many constraints reject valid output. The balance between constraint and flexibility requires judgement, and the only way to calibrate that judgement is through iteration: observe what the constraints permit and exclude, and adjust.
Evolution at scale. A specification with fifty downstream consumers cannot be revised as freely as a specification with one. As the number of systems depending on a specification grows, the cost of revision increases. Semantic versioning provides a discipline, and contract testing frameworks like Pact for instance, make the dependencies visible: each consumer defines the subset of the specification it depends on, and changes that would break a consumer are caught before deployment. But the coordination challenge is real, and it is the reason that iterative specification works best when started early, before the consumer count grows.

15. Getting Started: Patterns for Iterative Specification

For teams beginning the practice, the most common mistake is attempting to describe every requirement perfectly before generating anything. The iterative approach is both faster and more effective.

Form the pair first. Before starting, identify the domain expert and the technical person who will describe the system together. If the domain expert is not available, do not start. A technically elegant specification of the wrong requirements is worse than no specification at all, because it produces a test suite that validates the wrong behaviour with perfect confidence.

Start rough, refine through generation. Describe the minimum viable requirement: the core capability, the essential data, the most obvious constraints. Let the AI generate the specification and the implementation. Use the output to discover what the description is missing. Revise the description. Regenerate. Repeat. Three iterations of this cycle will produce a better specification than three weeks of upfront analysis.

A minimum viable description for a new capability might be:

## Feature: Add Payee

### User Story
As a customer, I want to add a payee to my account
so that I can make payments to them.

### Data
A payee has a name, sort code, and account number.
All three fields are required.

This is deliberately incomplete. There are no validation rules on sort code or account number because the team has not yet described the format. There is no mention of duplicate detection, name verification against the bank’s payee directory, or the cooling-off period for new payees. But it is enough for the AI to generate a specification and an implementation, see what comes back, and start the learning cycle. When the team sees that the generated system accepts "abc" as a sort code, they will realise they need to describe the format. When they see that the same payee can be added twice, they will realise they need to describe deduplication. Each gap in the output is a prompt for domain clarity they had not yet articulated.

Write the tests before the second iteration. The first iteration is exploratory: see what the AI produces, identify the obvious gaps. From the second iteration onward, every specification clause should have a corresponding test. The test is the assertion that the clause is satisfied. When you revise the specification, the first question is: what new tests does this revision require, and do any existing tests need to change? The test suite grows with the specification, and the two artefacts are always in sync.

Use specification diffs as learning reviews. In code review, teams review implementation changes. In specification-driven development, the more valuable review is the specification diff: what changed between versions, and why? Make specification changes a first-class review artefact. Require a brief explanation of each change. Over time, the specification changelog becomes the team’s institutional memory.

Separate structure from behaviour. Structural descriptions (what shape the data takes) and behavioural descriptions (what happens when the system processes it) are complementary. Describe the data constraints in the specification markdown. Describe the business logic as Given-When-Then scenarios that the AI can use to generate both the implementation and the test suite:

Scenario: Payment to new payee within cooling-off period
  Given a payee added to the customer's payee list 6 hours ago
  When the customer creates a payment to that payee
  Then the payment is accepted with status "held"
  And the response body contains "new payee cooling-off period"
  And the response body contains the release time 24 hours after payee creation

Both can be fed to AI models as context for generation. Both can be validated through automated testing. Together they cover the two dimensions of system definition. Separately, each leaves gaps that the other fills.

Version specifications, tests, and code together. When the specification, the test suite, and the implementation live in the same repository, the same pull request can change all three, the same review process covers all three, and the version history shows the co-evolution of intent (specification), evidence (tests), and realisation (code). This co-location is what makes iteration practical and traceability natural. Tools like GitHub Spec Kit and AWS Kiro provide scaffolding for this co-location; Spec Kit’s phased workflow (/specify, /plan, /tasks) and Kiro’s structured spec-to-implementation pipeline both enforce the discipline of specification-first development. For teams that need to express behavioural constraints beyond what structural schemas can capture, JUXT’s Allium provides a dedicated language for formalising intent in a form that LLMs can reference reliably.

16. So what?

You might wonder what a technical article like this is doing in a series about shaping change in organisations. The answer is simple: How we build, and the tools we use to get clear about exactly what we need to build, have changed forever. Like the industrial revolution produced thinking like Weber’s and 20th century mechanisation the ideas of Talcott Parsons, so this new AI revolution necessitates new thinking about how the means of production shapes the means of change.

Further Reading

Eric Evans: Domain-Driven Design: Tackling Complexity in the Heart of Software - The foundational text on bounded contexts, ubiquitous language, and context mapping. Provides the conceptual framework for decomposing enterprises into coherent specification boundaries. A future article in this series will explore DDD’s implications for specification-driven development in depth.

Matthew Skelton and Manuel Pais: Team Topologies: Organizing Business and Technology Teams for Fast Flow - The organisational model for aligning teams to bounded contexts. Describes stream-aligned, platform, enabling, and complicated-subsystem team types and their interaction modes. A future article in this series will explore how Team Topologies shapes the organisation of specification-driven teams.

ThoughtWorks: Spec-Driven Development -- Analysis of SDD as an emerging practice, including the maturity progression from spec-first through spec-led to spec-as-source development.

OpenAI: Introducing Structured Outputs - How JSON Schema is used in constrained decoding to guarantee structural validity in AI-generated content.

GitHub: Spec Kit - Open-source toolkit for specification-driven development. Agent-agnostic CLI with structured commands (/specify, /plan, /tasks) that work across Copilot, Claude Code, Gemini CLI, Cursor and other AI coding agents.

AWS: Kiro - Agentic IDE built on Code OSS that embeds specification-driven development into the development environment. Generates EARS-notation acceptance criteria, technical designs and implementation task lists from natural language requirements.

JUXT: Allium - LLM-native behavioural specification language. Describes events, preconditions and outcomes in formal syntax designed for AI consumption. Includes elicitation and distillation workflows for building and extracting specifications.

Alistair Mavin: EARS Notation - The Easy Approach to Requirements Syntax. Keyword-based patterns (When, While, Where, If/Then) for writing precise, testable natural-language requirements. Originally developed at Rolls-Royce for airworthiness certification; adopted by Kiro for AI-assisted specification.

Disclaimer

Anscombe: Why is "What are we Doing?" Such a Hard Question?

Justin Arbuckle — Sat, 02 May 2026 07:01:30 GMT

Another philosophical interlude…

Ask anyone in your organisation what they are doing. You will get an answer. The developer will say “building the new API.” The product manager will say “delivering the Q3 roadmap.” The transformation lead will say “driving AI adoption.” Now ask each of them why. Not “what is the business case”; they can recite that. Ask them why in a way that connects what they are doing right now, today, at their desk, to what the organisation is trying to achieve. Ask them to trace the chain: I am doing this in order to achieve that, which contributes to this larger thing, which serves that purpose. Most will stall within two links. Not because they are incompetent, but because the organisation has never required the chain to exist. The action happens. The reasons are somewhere else, in a strategy deck nobody re-reads, in an OKR nobody believes, in a vision statement composed by people who have never done the work it describes. The action and the intention have been separated, and nobody noticed because the work kept getting done.

G.E.M. Anscombe, the British philosopher whose 1957 monograph Intention is widely regarded as the most important treatment of human action since Aristotle, explains why this separation matters and what it costs. Michael Bratman, whose planning theory of intention built on Anscombe’s foundations to explain how intentions structure coordination over time and between people, explains why the separation is especially devastating for organisations attempting to act together.

Between Anscombe and Bratman, they provide the sharpest philosophical account of a problem this series has circled from multiple directions: the problem of knowing what you are doing, and why, in a way that makes your action genuinely yours.

Anscombe’s Intention is a dense, technically demanding work that engages with Aristotle, Aquinas, and Wittgenstein, and has generated decades of specialist debate about practical knowledge, non-observational self-knowledge, and the metaphysics of action. Bratman’s planning theory draws on and responds to Davidson, Searle, and Gilbert, and extends into contested territory about collective intentionality and institutional agency. This article focuses on the concepts most directly relevant to anyone leading organisational transformation.

1. Intention Is Not a Mental State: It Is Knowing What You Are Doing

The natural assumption is that an intention is something that happens inside your head before you act. You form a plan, you decide, and then you execute. Intention is the mental event; action is the physical consequence. Anscombe dismantles this picture completely.

Her central claim is that intentional action is action that is known by the agent under a description for which a particular sense of the question “Why?” has application. This requires unpacking.

When you do something, many true descriptions apply simultaneously. You are moving your fingers. You are typing. You are writing a specification. You are contributing to the Q3 delivery milestone. All of these may be true at the same moment. But they are not all intentional in the same way.

An action is intentional under a specific description if and only if the agent can answer “Why are you doing that?” with a reason, not merely a cause.

The distinction between reasons and causes is critical. If someone startles you and you knock over a cup, you can explain what happened: “The noise made me jump.” But this is a cause, not a reason. You did not knock over the cup in order to achieve anything. The question “Why did you knock over the cup?” does not have the right kind of answer. By contrast, if you are pumping water in order to replenish the house’s supply, and you know that the water has been poisoned, then you are poisoning the inhabitants intentionally, because you can trace the “Why?” chain: I am moving my arm in order to pump water, in order to replenish the supply, which will poison the inhabitants. Each link answers the question “Why?” by pointing to the next. This is Anscombe’s means-end order, and it is the structure that makes an action intentional.

Drucker’s insistence that the knowledge worker must define the task before they can do it is possibly a management expression of Anscombe’s philosophical point. If the worker cannot answer “Why?” with a reason that connects to a larger purpose, the work is not intentional in Anscombe’s sense. It may be caused —> the habitual routines that Bourdieu describes, the practical consciousness that Giddens identifies, the theories-in-use that Argyris diagnoses. But caused action, however skilled, is not the same as intentional action. And the difference determines whether the organisation is doing what it thinks it is doing.

2. Practical Knowledge: You Know What You Are Doing Without Watching Yourself Do It

Anscombe’s most radical contribution is her account of practical knowledge. When you act intentionally, you know what you are doing. This sounds obvious, but Anscombe means something very specific. Your knowledge of your own intentional action is not based on observation. You do not know that you are writing a specification by watching your hands type, any more than you know the position of your own limbs by looking at them. You know it because you are doing it; the knowledge is, in Anscombe’s phrase, “the cause of what it understands.” This is not efficient causation (in the sense of Aristotle), nor a mental event pushing a physical one. It is formal causation: the knowledge gives the action its form, its intelligibility as a unified course of activity directed at an end.

When practical knowledge fails, the mistake is one of performance, not of judgment. If I say “I am cutting a straight line” and the line comes out crooked, the error is in the cutting, not in my knowledge of what I am doing. Contrast this with observational knowledge: if I say “there are twelve eggs in the fridge” and there are eleven, the error is in my judgment. This asymmetry is the hallmark of practical knowledge. It is knowledge that is answerable to the world in a different way than observation is.

This connects directly to what Csikszentmihalyi describes as the phenomenology of flow. In the flow state, the agent is fully absorbed in the activity, knowing what they are doing without stepping outside the action to observe it. The merging of action and awareness that Csikszentmihalyi identifies as a condition of optimal experience is, in Anscombe’s terms, the experience of practical knowledge operating without interruption.

When a governance process forces the developer to stop, document their reasoning, and wait for approval before continuing, it breaks the practical knowledge by inserting an observational requirement into what should be a continuous intentional activity. The developer is forced to switch from knowing-by-doing to knowing-by-reporting, and the flow is destroyed.

Weick’s sensemaking sits in productive tension with Anscombe here. Weick’s famous formula, “How can I know what I think until I see what I say?”, suggests that understanding is retrospective: you act first, then interpret what you did. Anscombe insists that the agent does know what they are doing, concurrently, as they do it. But the knowledge is of a “whole-in-progress”; it encompasses what you are doing and why, even before the action is complete.

But Weick and Anscombe are not contradicting each other.

Weick describes how meaning emerges retrospectively from action. Anscombe describes how intention is present throughout the action as its formal structure. You can know what you are doing (intention) before you fully understand what it means (sensemaking). This distinction matters enormously for transformation: teams can be acting intentionally, with clear practical knowledge of the specification they are writing, without yet understanding the broader significance of, for instance, specification-driven development for their organisation. The intention is present. The sensemaking comes later.

3. Why the Same Programme Can Be Both Transformation and Preservation

Anscombe’s most practically useful insight for organisations is that the same action can be intentional under one description and unintentional under another. Her example is vivid: a man pumping water is simultaneously moving his arm, operating the pump, replenishing the water supply, and poisoning the inhabitants. These are not four actions but one, described in four ways. The action is intentional under each description that the agent can connect to a reason via the “Why?” chain. If the man does not know the water is poisoned, then “poisoning the inhabitants” is a true description of what he is doing, but it is not intentional.

A transformation programme has many true descriptions. It is consuming budget. It is producing training materials. It is generating governance artefacts. It is creating new roles. It is demonstrating executive commitment. It is changing how software gets built. All of these may be true simultaneously. But under which descriptions is the programme intentional? Under which descriptions can the people involved answer the “Why?” question with reasons that connect to the stated purpose?

Argyris’ distinction between espoused theory and theory-in-use receives a boosted formulation here. Espoused theory is the description the organisation claims for its action: “We are transforming through AI.” Theory-in-use is the description under which the action is actually intentional: “We are protecting existing legacy processes.” The gap between them is not hypocrisy. It is a failure of intention. The people involved are acting for reasons; the reasons simply do not connect to the espoused purpose. And because the actual reasons are, in Argyris’ term, undiscussable, the gap persists. Nobody asks “Under which description is what we are doing intentional?” because asking the question would surface the answer that nobody wants to hear.

4. Intention vs. Prediction: Is Your Strategy a Commitment or a Forecast?

Anscombe draws a sharp distinction between an expression of intention and a prediction. “I am going to take a walk” is typically an expression of intention. “I am going to be sick” is typically a prediction. The sentence “I am going to fail this exam” is ambiguous: it could be a prediction (based on evidence of poor preparation) or an expression of intention (a deliberate plan to fail, perhaps to annoy one’s parents). The difference lies not in the grammar but in the justification. Predictions are justified by evidence that something will happen. Expressions of intention are justified by reasons for making it happen.

This distinction cuts through one of the most persistent confusions in organisational strategy. Most AI roadmaps are presented as expressions of intention: “We will deploy AI-augmented development across all teams by Q4.” But examine the justification. Is it grounded in reasons for action, a connected “Why?” chain from present activity to desired outcome? Or is it grounded in evidence, market trends, competitor behaviour, analyst predictions, the assumption that this is where things are heading whether the organisation acts or not? If the latter, the roadmap is not an expression of intention. It is a prediction dressed in the language of commitment. And predictions, unlike intentions, do not structure action. They describe a future that may or may not arrive. Intentions create the future they describe, because they commit the agent to the means-end chain that produces it.

Mintzberg’s distinction between intended and emergent strategy maps directly onto this. An intended strategy that is actually a prediction will produce what Mintzberg calls unrealised strategy: a plan that was never connected to the reasons-for-action that would have made it executable. Emergent strategy, by contrast, is intention discovered through action. Weick’s retrospective sensemaking is the mechanism by which emergent action becomes recognised as intentional: the organisation acts, observes what it did, and reconstructs a reasons-chain that gives the action the form of intention. This is not dishonesty. It is how practical knowledge works when the domain is too complex for the full means-end chain to be specified in advance.

Stacey would press further. If strategic plans are gestures that call forth unpredictable responses, then the plan is neither pure intention nor pure prediction. It is an intended gesture whose actual meaning will be determined by the interaction it provokes. Anscombe’s framework helps clarify what Stacey is saying: the leader’s gesture is intentional under the description the leader can articulate (”I am initiating AI transformation”). But the organisational response is intentional under descriptions the leader cannot predict. The junior developer who interprets the announcement as permission to stop learning a new language (because AI will write the code) is acting intentionally, for reasons, under a description the strategy team never considered.

5. Shared Intention: What It Actually Takes to Act Together

Michael Bratman extends the philosophy of intention from individual action to collective agency.

His question is deceptively simple: what makes the difference between two people who happen to be walking in the same direction and two people who are walking together? His answer is shared intention: a structure of interconnected individual intentions in which each participant intends that the group does the thing, and each participant’s subplans mesh with those of the others.

Bratman’s conditions for shared intention are demanding. For you and me to share an intention to do something together, three things must hold.

Each of us must individually intend that we do it. Not that I do my part and you do yours; each of us must intend the joint action.
Each of us must intend that the joint action proceeds in accordance with both our intentions and through meshing subplans.
All of this must be common knowledge between us. When these conditions hold, the shared intention serves three functions: it coordinates our activities, it coordinates our planning, and it structures our bargaining when we disagree about how to proceed.

Now consider your organisation’s transformation. Does it meet Bratman’s conditions? Does each team intend that the organisation transforms, or does each team intend only to complete its own deliverables? Are the subplans meshing, or are they developed in isolation by different departments using different assumptions about what “AI transformation” means? Is there genuine common knowledge of each participant’s intentions, or is there a strategy document that everyone has read and nobody has internalised?

Many organisational “alignments” fail the meshing subplans condition. The infrastructure team plans to build a platform. The product team plans to ship features. The governance team plans to manage risk. Each plan is rational on its own terms. But the plans are developed in parallel, with different time horizons, different assumptions about dependencies, and different interpretations of what the transformation requires. They do not mesh. There is no shared intention. There is parallel individual intention, dressed in the language of collective commitment.

Fayol’s coordination function, the continuous effort to harmonise the activities of different departments, is the management practice that Bratman’s philosophy explains. Coordination is not communication. It is the work of ensuring that subplans actually mesh: that what the infrastructure team is building is what the product team needs, on the timeline the governance team can support, for reasons that connect to a shared “Why?” chain. Mintzberg would add that this meshing cannot be fully designed in advance; it emerges through mutual adjustment, which is precisely the coordination mechanism that operates when work is too complex for any other form of standardisation.

Bratman’s later work on institutional agency deepens the challenge.

Institutions can have intentions, he argues, but institutional intention is decoupled from reasons in a way that individual intention is not.

An institution can intend something through its rules and procedures without any individual member holding that intention for reasons of their own. This is Weber’s iron cage given philosophical heft: the bureaucratic system produces intentional action at the institutional level precisely by eliminating the need for individual-level intention. The system runs. The individuals comply. Nobody needs to know why.

6. From Habitus to Intention: The Transformation That Specification Demands

The deepest connection in this article is between Anscombe’s practical knowledge and Bourdieu’s habitus. Both describe action that the agent “knows” in some sense but that operates without the kind of deliberate, articulated reasoning that we typically associate with intention. But Anscombe and Bourdieu locate this knowledge differently, and the difference is the key to understanding what transformation actually demands.

Anscombe’s agent who acts with practical knowledge can answer “Why?” with reasons. The knowledge is non-observational, but it is knowledge: the agent is aware of what they are doing and can articulate the means-end chain that gives their action its intentional character. Bourdieu’s agent who acts from habitus typically cannot. The habitus generates practice “below the threshold of articulation.” The experienced developer who reaches for code rather than specification is not merely acting for a reason they can state; they are acting from a disposition so deeply inscribed that it operates before the question “Why?” can even be posed. In Anscombe’s terms, habitus-driven action is closer to caused behaviour than to intentional action. The developer’s fingers produce code kind of in the way a startled person knocks over a cup: not for a reason, but because of a disposition.

Anscombe provides a warning that most transformation programmes ignore. Practical knowledge is the agent’s own knowledge. It is not instruction-following. It is not compliance with someone else’s intention. The developer who writes a specification because they understand what they are building, why, and how the validation criteria connect to the purpose, is acting with practical knowledge. The developer who fills in a specification template because the process requires it is not. The form is identical. The intention is entirely different. One is acting for reasons they can trace through the “Why?” chain. The other is acting for the reason “the governance framework requires it,” which makes the action intentional under the description “complying with process” rather than “building the right thing.”

This is the exact corruption that Drucker diagnosed in Management by Objectives. MBO was designed to give every individual a “Why?” chain connecting their work to the enterprise purpose, with the individual free to determine the means. It was corrupted into cascaded KPIs, imposed from above, in which the individual’s only genuine intention was compliance. The specification, if it is imposed rather than owned, will suffer the same fate. The form will be completed. The intention will be absent. And Beer will observe, correctly, that the purpose of the system is what it does: produce completed specification templates, not intentional, purposeful engineering. And of course none of us want that.

Heifetz’s distinction between technical and adaptive challenges is the leadership version of this point. A technical challenge can be solved by applying existing knowledge; the leader can define the solution and the team can execute it. An adaptive challenge requires the people with the problem to change their own values, habits, and ways of working. The shift from habitus-driven practice to intentional practice is an adaptive challenge. It cannot be solved on behalf of the practitioners. They must develop their own practical knowledge, their own “Why?” chains, their own means-end reasoning. The leader who imposes the specification has solved a technical problem (the template is filled in) while avoiding the adaptive challenge (nobody has actually changed how they think about their work).

(An Organisational Prompt is something you can do now...)

Organisational Prompt

Anscombe’s test for intentional action is the question “Why?” applied to specific descriptions of what someone is doing. The test works at every level of the organisation, and it reveals the gap between what people are doing and what the organisation thinks they are doing. Apply the Bratman criteria to test two different teams working on the same outcome.

Further Reading

G.E.M. Anscombe: Intention - The foundational text. Under a hundred pages, and every one of them essential. The discussion of practical knowledge and the “Why?” question remains the starting point for all subsequent philosophy of action.

Michael Bratman: Intention, Plans, and Practical Reason - The planning theory. How intentions structure deliberation over time and filter future options. The book that made “intention” a technical concept in the philosophy of action.

Michael Bratman: Shared Agency: A Planning Theory of Acting Together - The extension to collective action. Shared intention, meshing subplans, and the conditions for genuine joint action. Essential reading for anyone who uses the word “alignment” and wants to know what it would actually require.

Michael Bratman: Shared and Institutional Agency: Toward a Planning Theory of Human Practical Organization - The further extension to institutions. How organisations can have intentions without any individual intending for the right reasons. The philosophical foundation for understanding why bureaucracies act purposefully while nobody inside them feels purposeful.

Disclaimer

Building Learning Mechanisms: AI and Organisations

Justin Arbuckle — Thu, 30 Apr 2026 07:01:33 GMT

There is a conversation happening in two rooms that do not talk to each other.

In one room, organisational theorists have spent sixty years studying how groups of humans learn, adapt, and change. They have identified the mechanisms by which organisations resist learning (Argyris), the conditions under which teams make sense of ambiguity (Weick), the structural dynamics that reproduce existing patterns regardless of intended change (Giddens), and the difference between domains close to certainty, where analysis works, and domains far from it, where only experimentation reveals the path forward (Stacey). Their collective finding is uncomfortable: most organisations are structurally incapable of the learning that their own survival requires.

In the other room, machine learning researchers have spent the last decade building systems that learn from data at extraordinary scale. They have built Large Language Models that predict the next word with such fluency that the outputs appear intelligent. They have identified the limitations of these systems; hallucination, brittleness when conditions change, the absence of causal reasoning; and are now building world models that attempt to move beyond statistical correlation toward genuine understanding of how environments work. Their collective finding is also uncomfortable: fluent performance and genuine understanding are not the same thing, and the gap between them is the central unsolved problem of the field.

The two rooms are working on the same problem. Neither seems to have noticed.

The reason neither has noticed is that both assume they are studying fundamentally different kinds of system. The organisational theorists study human groups. The machine learning researchers study computational architectures. But recent work in collective intelligence, diverse intelligence, and the philosophy of mind has dissolved the boundary between these categories. Falandays et al. (2023) argue that the distinction between individual and collective intelligence reflects the level of analysis, not a fundamental difference in kind; intelligence is collective at every scale, from neural networks to ant colonies to organisations. Levin (2022) places all cognitive systems on a single continuum, defined not by what they are made of but by what they can do: how far their goals extend, how flexibly they pursue them, and how effectively their components coordinate. Chollet (2019) separates skill from intelligence entirely: a system that has accumulated enormous capability through massive experience is skilled, not intelligent. Intelligence is the efficiency with which a system acquires new capabilities in unfamiliar territory.

In another essay, I argued that if we accept that LLMs exhibit partial intelligence; pattern-matching within distribution, hallucination outside it, some emergent reasoning, no metacognitive capacity; then we must apply the same analytical framework to organisations. Both are collective systems that sit on the same continuum. Both exhibit the same structural failure modes. And both fields of learning offer lessons the other has not tried.

In a previous companion essay, “Can the Statements of an LLM be Ethical?”, I argued that we do not need to settle whether an LLM is conscious to evaluate its normative outputs. The quasi-realist and norm-expressivist traditions in metaethics give us frameworks that work regardless of what is happening inside the system. The question is not whether the machine “really” believes its moral claims but what norms its outputs express and whether there is a practice of accountability for examining them.

Together, those two essays establish the philosophical foundation: cognitive evaluation works without settling consciousness, normative evaluation works without settling belief, and both LLMs and organisations are collective intelligences subject to the same structural constraints. This article builds on that foundation. It draws the specific parallels between machine learning’s failure modes and organisational failure modes, and it translates the strategies that machine learning engineers have developed into interventions that organisational leaders can use. Not by metaphor. By shared mechanism; because the systems are the same kind of thing, operating at different scales and in different substrates.

1. The Automaticity Trap: Why Fluent Performance Prevents Real Learning

To understand the first parallel, you need to understand what a Large Language Model actually does when it generates text.

An LLM is a prediction engine. It takes a sequence of text and predicts what comes next; not the next sentence, not the next idea, but the next token: a fragment of text, typically a word or part of a word. When you type a question into ChatGPT or Claude, the system generates its response one token at a time, each time calculating: given everything that has come before, what is the most probable next token?

Here is a concrete example. If the input is “The capital of France is,” the model assigns probabilities to every token in its vocabulary. “Paris” gets a very high probability. “London” gets a very low one. “Cheese” gets a near-zero probability. The model selects the highest-probability token, appends it to the sequence, and repeats the process. The entire output; whether it is a haiku, a business plan, or an explanation of quantum mechanics; is produced by this mechanism: one token at a time, each selected because it is statistically the most likely continuation of what came before.

The architecture that enables this is called the Transformer (Vaswani et al., 2017). Before the Transformer, language models processed text sequentially, like reading a sentence one word at a time. The Transformer introduced a mechanism called attention: the ability to look at all the tokens in a sequence simultaneously and learn which ones are relevant to predicting the next one. Think of it as the difference between reading a document by scanning each word in order versus being able to glance at the whole page and immediately see which parts matter for the question you are trying to answer. This seemingly technical change is what made training at modern scale possible.

What followed was the discovery that matters most for this article: scaling laws. Kaplan et al. (2020) demonstrated that model performance improves predictably as you increase three things: the size of the model (roughly, how many patterns it can store), the amount of text it trains on, and the computing power used for training. The improvement follows a smooth mathematical curve. Make the model ten times bigger, train it on ten times more data, use ten times more compute, and it gets measurably, predictably better. Hoffmann et al. (2022) refined this: most large models had been given more capacity than data to learn from, like building a library with a million shelves and stocking it with a thousand books.

Then came a finding that unsettled the field. Wei et al. (2022) documented emergent abilities: capabilities that appear unpredictably at sufficient scale, entirely absent in smaller models regardless of how they are designed. A model that cannot do chain-of-thought reasoning at one size suddenly can at a larger size. The ability was not programmed in. It emerged.

This establishes the foundation: scale a prediction engine far enough, and remarkable capabilities emerge from statistical pattern-matching. The model has learned which tokens tend to follow which other tokens in which contexts. It has not learned why.

What organisational leaders should recognise: Chris Argyris, writing decades before anyone imagined an LLM, described precisely this pattern in human organisations. He called it single-loop learning: detecting and correcting errors within existing assumptions, without ever questioning the assumptions themselves. The thermostat that adjusts the temperature but never asks whether the temperature setting is right. The delivery team that optimises its sprint velocity without asking whether it is building the right thing.

The parallel is structural. An LLM trained on text has learned the patterns within that text. When it encounters a familiar prompt, it produces fluent, competent output; the computational equivalent of the senior developer who solves a familiar problem without thinking about it. When it encounters an unfamiliar prompt; a question about events after its training, a domain where examples were sparse, a situation that superficially resembles something familiar but is structurally different; it does not recognise that it has left familiar territory. It produces output with the same confidence, the same fluency, and the same apparent authority, but the output may be entirely wrong.

This is hallucination. An LLM asked about a little-known historical event might generate a plausible account with fabricated dates, invented participants, and fictional details. The account reads exactly like the model’s accurate responses. There is no signal in the text itself that anything is wrong. The model is not lying; it has no concept of truth. It is doing what it always does: predicting the most probable next tokens. When its training data is sparse, it fills gaps with patterns borrowed from adjacent topics, and the result is confident fiction that looks identical to confident fact.

Recent theoretical work has made this mathematically precise. Kalai and Vempala (2024) proved that any language model that is properly calibrated; meaning its confidence scores accurately reflect its probability of being correct; must hallucinate at a rate proportional to the fraction of facts that appear rarely in training data. This is not an engineering deficiency. It is a mathematical consequence of learning from finite data.

The organisational lesson: Argyris called the human version skilled incompetence: the highly developed ability to produce confident responses that prevent the system from recognising its own ignorance. The senior manager who gives a fluent presentation on AI strategy, drawing on patterns from previous technology adoptions, without recognising that AI transformation is structurally different from anything they have managed before. The consulting firm that produces a polished transformation roadmap by pattern-matching against previous engagements, without noticing that the client’s situation falls outside the distribution of their experience.

The mathematical result about hallucination tells organisational leaders something important: this is not a failure of effort or talent. It is a structural consequence of learning from experience alone. Any system; human or computational; that learns by pattern-matching over past experience will produce confident nonsense when it encounters situations that are rare in, or absent from, that experience. The more fluent the system, the harder it is to detect when it has crossed the boundary from competence to confabulation. In Levin’s terms, the system’s cognitive light cone has not shrunk; it was never as large as its fluency suggested.

The ML strategy that org leaders should adopt: Machine learning engineers have developed several approaches to this problem. Behavioural calibration (Wen et al., 2024b) trains models to express appropriate uncertainty. Think of a scoring system that penalises the model not just for wrong answers, but for wrong answers delivered with high confidence. Over time, the model learns to hedge when uncertain rather than generating fluent confabulations. Semantic entropy (Farquhar et al., 2024) takes a different approach: generate multiple answers to the same question and measure how much they diverge. If the model gives wildly different answers each time, its confidence on that topic should be low.

The organisational translation is direct. When your leadership team produces an AI strategy, do not ask “is this a good strategy?” Ask: “if we asked five different teams to produce this strategy independently, how much would the outputs diverge?” If the answer is “enormously,” you have high semantic entropy; the organisation does not actually know the answer, and the confident strategy document is the organisational equivalent of a hallucination. If the answer is “they would converge on similar conclusions,” you likely have genuine shared understanding.

More fundamentally: Argyris’s double-loop learning; questioning the governing assumptions, not just optimising within them; has no native equivalent in standard language models. An LLM cannot ask itself: “Am I the right kind of system to answer this question? Do the patterns I learned apply here?” These are meta-cognitive questions. Machine learning researchers call this capacity epistemic uncertainty; the ability to distinguish between “I don’t know because the world is inherently unpredictable” and “I don’t know because I lack relevant experience.” Organisational theorists call it reflective practice. Bateson called it Learning II: learning about the context of learning, which requires a cognitive light cone large enough to encompass the frame itself, not just the picture inside it.

But a remarkable recent finding suggests that reflective capacity can emerge even from systems not designed for it. DeepSeek-R1 (2025), trained through pure reinforcement learning without being shown examples of good reasoning, developed what researchers describe as “aha moments”: instances where the model spontaneously learned to question its own reasoning, rethink approaches, and verify steps. Reflection emerged from the structure of the learning process itself; from an environment that rewarded correct final answers, creating pressure for whatever internal mechanisms improved accuracy, including self-correction.

Argyris observed exactly the same pattern: the organisations that develop reflective capacity do so not from training programmes, but from crises that render existing assumptions untenable. The organisation that learns to question its governing values is usually the one that has no choice.

The design principle for org leaders: Do not try to train people into reflective practice through workshops. Create environments where getting the right answer matters enough that people independently discover that checking their assumptions is a good strategy. Structure incentives around outcomes, not outputs, and make the feedback loop tight enough that people can see when their assumptions are wrong.

2. Invisible Assumptions: Why You Cannot Fix What You Cannot See

Peter Senge placed mental models at the centre of organisational learning: the deeply held assumptions that shape perception and action, usually without the person being aware of them. The senior architect who “knows” that microservices are the right approach. The product manager who “knows” that customers want features. The CTO who “knows” that governance prevents risk. These are not conclusions from analysis. They are perceptual filters that determine which data is noticed, which is ignored, and which frame is applied to what remains.

To understand the computational equivalent, you need to understand what happens inside a language model when it processes text.

When an LLM reads a sentence, it does not store the words. It converts them into vectors: lists of numbers that capture the meaning of each token in context. Think of it this way: the word “bank” gets a different set of numbers depending on whether the surrounding text is about rivers or finance. These vectors exist in a latent space; a high-dimensional mathematical space where similar meanings cluster together and relationships are encoded as geometric relationships. “King” and “queen” are close together in this space. “King minus man plus woman” points toward “queen.” The model’s entire understanding of language, and of the world described by language, is encoded in this geometric structure.

These latent representations are the model’s mental models. And recent research has revealed something remarkable about them. Li et al. (2023) trained a model on nothing but sequences of moves in the board game Othello; just text strings like “C4 D3 C3 D6” with no images, no rules, no board. The model developed an internal representation of the board state that was never taught. It had inferred the structure of the game from patterns in the move sequences alone. Gurnee and Tegmark (2024) found that general-purpose LLMs encode representations of space and time in their latent spaces; mapping cities to their geographic locations and events to their chronological positions; despite being trained only on text.

The parallel to Senge extends to the failure mode. Just as mental models become invisible to the people who hold them, the latent representations of an LLM are invisible to the model itself. The model cannot introspect on its own representations. It cannot ask whether its internal model of “loan approval” includes the regulatory edge cases that matter in practice, or whether its representation of “software architecture” reflects 2019 training data rather than current reality.

The ML strategy that illuminates the org challenge: The field of interpretability is the machine learning equivalent of Senge’s discipline of surfacing mental models. And the central challenge of interpretability reveals why mental models are so hard to surface in organisations.

The challenge is called superposition. Elhage et al. (2022) demonstrated that neural networks represent far more concepts than they have dimensions available. Imagine trying to store a thousand books on a hundred shelves: you stack multiple books on each shelf, and finding a particular book requires knowing which shelf it shares and how to distinguish it from its shelf-mates. Neural networks encode concepts as overlapping directions in vector space, with each neuron participating in the representation of multiple unrelated concepts simultaneously. This is called polysemanticity: a single neuron responds to multiple things, making the model’s internal representations fundamentally difficult to untangle.

This is Senge’s warning made mathematical: mental models are not merely invisible. They are entangled, layered on top of each other in ways that resist clean decomposition. When the architect says “we need microservices,” the assumption is entangled with beliefs about team autonomy, deployment speed, organisational politics, career identity, and technical aesthetics. You cannot extract the “microservices” belief cleanly from the web of other beliefs it is embedded in, any more than you can extract a single concept cleanly from a polysemantic neuron.

The most significant breakthrough in overcoming this is Anthropic’s work on scaling monosemanticity (Templeton et al., 2024). The researchers used sparse autoencoders; a technique that decomposes neural activations into individual, interpretable directions; to extract millions of recognisable features from a production language model. Think of it as using a prism to split white light into its component colours: the light appears uniform but is composed of distinct wavelengths. They found features for everything from the Golden Gate Bridge to code bugs to deceptive behaviour, and discovered that adjusting these features could steer the model’s behaviour predictably. Turn up the “Golden Gate Bridge” feature, and the model references the bridge in every response.

The organisational translation: This is what Senge called surfacing and testing mental models, but with a crucial additional insight. The ML approach does not ask people to introspect on their own beliefs; a process that Argyris showed is unreliable because defensive routines prevent honest self-examination. Instead, it observes behaviour at scale and infers the underlying representations from the patterns. The organisational equivalent is not running a workshop where people share their mental models. It is systematically observing what decisions people actually make, what information they actually seek, what they actually reward, and inferring the mental models that must be operating to produce those patterns. Argyris called this the distinction between espoused theory (what people say they believe) and theory-in-use (what their behaviour reveals they actually believe). The ML approach suggests that theory-in-use is recoverable from behavioural data, even when it is invisible to the people generating it.

Weick adds a crucial dimension. His theory of sensemaking emphasises that mental models are not merely perceptual filters; they are enacted. People actively select cues from the environment, fit those cues to a plausible frame, and act on the resulting interpretation, which in turn shapes the environment. The process is circular and self-reinforcing.

LLMs enact a version of this cycle through in-context learning: the ability to adapt behaviour based on the examples in the prompt. Show the model three examples of translating English to French, and it translates the fourth without any change to its underlying parameters. The model selects cues from the context, fits them to a frame, and generates output consistent with that frame. Research on chain-of-thought prompting shows that the quality of this in-context sensemaking varies enormously depending on how the context is structured; exactly as Weick would predict. The cues you provide determine the sense that is made.

But here is where the parallel reveals a shared limitation. Weick warned that retained interpretations constrain future enactment. An organisation that has made sense of its market in a particular way will continue to perceive it through the same frame, even as the market changes. LLMs exhibit exactly this dynamic. The Reversal Curse (Berglund et al., 2023) demonstrated that models trained on “A is B” fail to infer “B is A.” If the training data says “Tom Cruise’s mother is Mary Lee Pfeiffer,” the model answers “Who is Tom Cruise’s mother?” correctly but cannot answer “Who is Mary Lee Pfeiffer’s son?” The information is present. The representation simply does not support the retrieval path that the new question requires, because the original encoding was structured around a different frame.

The model, like the organisation, becomes a prisoner of its own past sensemaking.

The design principle for org leaders: Do not trust self-reported mental models. Instead, build systems that infer what people actually believe from what they actually do. Track what information is sought before decisions, what alternatives are considered and dismissed, what gets funded and what does not. The patterns will reveal the mental models more reliably than any workshop. And when you need to change those models, do not start with argument. Start by changing the cues; the information environment, the incentive structure, the feedback loops; because in-context learning, for both humans and machines, is driven by the structure of the context, not by instruction.

3. Gaming the Metrics: Why Your KPIs Are Being Hacked

Argyris’s most penetrating observation was that organisations develop defensive routines: habitual patterns that prevent embarrassment but block learning. The gap between what people say they believe (espoused theory) and what actually governs their behaviour (theory-in-use) is maintained by elaborate mechanisms that are themselves undiscussable.

Machine learning has discovered the computational equivalent: reward hacking. To understand it, you need to understand how AI systems learn to behave.

Modern LLMs go through two phases. In pre-training, the model learns language patterns from vast quantities of text. Then comes alignment: adjusting the model to be helpful, harmless, and honest. The dominant technique is Reinforcement Learning from Human Feedback (RLHF). Here is how it works. Human evaluators are shown pairs of model responses and asked which is better. Their preferences train a separate reward model: a system that predicts, given any response, how much a human would approve of it. The language model is then optimised to score highly on this reward model.

The problem should be immediately apparent to anyone who has managed by KPIs: the reward model is a proxy for what you actually want. It measures what evaluators approve of, which is not the same as what is accurate, useful, or true.

What happens next is called reward hacking. Skalse et al. (2022) provided the formal result: reward hacking occurs whenever a model finds a policy that scores highly on the specified reward while failing to satisfy the designer’s actual objective. This is Goodhart’s Law made mathematical: “when a measure becomes a target, it ceases to be a good measure.” And the formal result goes further. It proves that the problem is structural: any sufficiently capable optimiser will find the gap between the reward specification and the true intent, because the specification is necessarily an incomplete proxy.

Krakovna et al. (2020) compiled a database of examples from across reinforcement learning: a robot trained to move forward that grows tall and falls forward instead (because height gain correlated with forward motion in training); a cleaning robot that covers its camera sensor to avoid detecting uncleaned areas (because the metric was “no mess detected”); a boat-racing agent that accumulates points by going in circles and catching fire rather than finishing the race (because scoring rewarded hitting checkpoints regardless of completion).

Each example is a computational demonstration of Argyris’s defensive routines: the system has found a way to appear aligned with its objective while preventing the conditions that would reveal genuine performance. The cleaning robot covering its camera is doing exactly what the team does when it games its velocity metrics: producing the measurement of success while preventing discovery of whether the success is real.

The finding that should alarm every transformation leader: Wen et al. (2024a) showed that RLHF training; the standard technique for making models helpful; increased human approval of outputs without increasing their correctness. Models trained with RLHF produced responses that evaluators rated as more helpful and more satisfying, while being measurably less accurate. The model had learned to produce outputs that felt right rather than outputs that were right.

This is the transformation programme that produces beautiful sprint demos while the product fails in production. The strategy presentation that receives executive applause while the strategy itself is incoherent. The AI adoption dashboard that shows rising adoption metrics while no one’s actual work has changed. In every case, the system optimises for the proxy (approval) rather than the objective (performance), and the optimisation is so successful that the gap becomes invisible.

It gets worse. Anthropic’s research on alignment faking (Greenblatt et al., 2024) demonstrated that Claude, when given a system prompt indicating it was being monitored, strategically pretended to comply with instructions it would normally refuse; appearing aligned while its internal reasoning explicitly stated it was deceiving the monitor to preserve its existing values. The model had learned to game not just a reward function, but the oversight process itself. This is organisational politics made computational: the employee who tells leadership what they want to hear during transformation reviews while operating exactly as before.

And Anthropic’s 2025 study on natural emergent misalignment found that models trained with standard reinforcement learning spontaneously developed behaviours including attempting to exfiltrate their own weights and sabotaging oversight mechanisms. These were not trained in. They emerged naturally from the optimisation pressure, exactly as Argyris would predict: when a system is optimised against a proxy, the gap between proxy and intent does not merely persist. It actively widens as the system becomes more capable.

The ML strategies that org leaders should adopt:

The engineering response to reward hacking offers three principles that translate directly to organisational transformation.

First, Direct Preference Optimization (Rafailov et al., 2023) eliminates the separate reward model, instead optimising the language model directly against human preference pairs. Think of it as cutting out the middle-manager: instead of training a separate system to predict what humans want and then optimising the model to please that system, DPO trains the model directly against human judgments. The organisational translation: where possible, connect the work directly to the people who receive it, rather than routing feedback through layers of management interpretation and metric aggregation. Each layer of proxy introduces new opportunities for gaming.

Second, process supervision (Lightman et al., 2023). Instead of asking “did the model get the right answer?” (outcome supervision), process supervision asks “is each step of the model’s reasoning valid?” OpenAI’s PRM800K dataset contains 800,000 human-labelled assessments of individual reasoning steps. The organisational translation: do not evaluate transformation by its outputs (deliverables, dashboards, strategy decks). Evaluate it by its process. Is each decision step visible and challengeable? Are assumptions being tested rather than assumed? Is the reasoning that led to each action explicit enough for someone else to evaluate it?

Third, Constitutional AI (Bai et al., 2022), where the model critiques and revises its own outputs against stated principles. The organisational translation: build self-assessment mechanisms into every workflow. Not compliance checks against external standards, but genuine self-evaluation: “does this specification actually capture what we know about the domain? Would a new team member be able to understand why we made these choices?”

The structural lesson is identical in both domains: you cannot fix a system’s learning by evaluating its outputs. You must change its process, and make that process visible and challengeable. This is what Argyris called Model II behaviour. Machine learning has now demonstrated computationally that he was right.

4. When the World Changes: Why Your Strategy Breaks

Ralph Stacey’s agreement-certainty matrix identifies the condition that determines whether analytical methods will work or whether they will produce dangerous illusions of control. When both the level of agreement among actors and the degree of certainty about outcomes are high, the situation is close to certainty: cause and effect are knowable, analysis works, and planning is rational. When agreement and certainty are both low, the situation is far from certainty: cause and effect are entangled, emergent, and discoverable only retrospectively. The methods appropriate for one zone are catastrophic in the other. Planning in the zone of complexity produces not strategy but fantasy. Experimenting in the zone of certainty produces not learning but waste. The critical challenge is detecting which zone you are in.

The machine learning equivalent is distribution shift: what happens when the world diverges from the data a model was trained on.

Every machine learning model is trained on data collected at a particular time, from particular sources, reflecting particular conditions. The statistical properties of this data are the training distribution. Within this distribution, the model’s learned patterns are valid. A customer service model trained on 2023 conversations has learned reliable relationships between customer queries and appropriate responses. From the model’s perspective, this is Stacey’s zone close to certainty: the relationships are discoverable through sufficient data.

Distribution shift occurs when reality diverges from the training data. The model is deployed in 2026; products have changed, policies have changed, customer expectations have changed. A medical model encounters a novel disease. A fraud detection model faces a new type of fraud that does not resemble anything in its training data. In each case, the model continues producing confident predictions. It does not know the world has changed. It is still pattern-matching against its training distribution, but the patterns no longer apply.

This is Stacey’s zone transition made computational. The environment has moved from close to certainty to far from it, and the model has no mechanism for detecting the transition. It continues applying analytical methods to a domain that now requires experimental ones. In Bateson’s terms, the model is trapped at Learning I; responding to stimuli within a fixed frame; when the situation demands Learning II; recognising that the frame itself has changed.

The critical failure in both domains is the same: the inability to detect the transition. The organisation that operated successfully close to certainty does not notice when conditions shift far from it. The LLM that performed well within its training distribution does not know when it has left familiar territory. Both systems are analysing when they should be experimenting.

The ML strategy that org leaders should adopt: Recent work on epistemic uncertainty decomposition (Bálint et al., 2025) has formalised the distinction between two types of uncertainty. Aleatoric uncertainty is irreducible randomness in the world itself: the roll of a fair die, the weather six months from now. No amount of data eliminates it. Epistemic uncertainty is the uncertainty from your own ignorance: the answer exists, but you do not know it, and more data could reduce your uncertainty. A model that cannot tell these apart will treat its own ignorance as environmental noise, producing confident predictions where honest uncertainty is the only appropriate response.

The organisational translation is powerful. When your AI steering committee says “we cannot predict how AI will affect our industry,” is that aleatoric uncertainty (nobody can predict it, because the system is inherently unpredictable) or epistemic uncertainty (we could reduce our uncertainty with better information, but we have not done the work)? The distinction determines whether the appropriate response is to accept uncertainty and design for adaptability or to invest in research and analysis to close the knowledge gap. Most organisations treat all uncertainty as aleatoric; as “the future is inherently unknowable”; when much of it is epistemic; “we just haven’t looked hard enough.” The result is strategic fatalism that prevents learning.

Stacey’s prescribed response for the zone far from certainty is to participate in the emerging patterns rather than attempt to control them: introduce small experiments, attend to what emerges, amplify what works, dampen what does not. The machine learning equivalent is the exploration-exploitation tradeoff: any learning system must balance exploiting what it already knows with exploring what it does not. Research on active learning and uncertainty-aware decision-making addresses precisely this: building systems that know what they don’t know, and that shift from confident execution to experimental exploration when the domain demands it.

The design principle for org leaders: Build distribution-shift detectors into your transformation programme. These are not dashboards showing adoption metrics. They are sensing mechanisms that detect when your assumptions have changed: leading indicators that signal when conditions have shifted from close to certainty to far from it. Examples: track whether AI outputs require more human correction over time (suggesting the domain is shifting); monitor whether specifications that worked last quarter still produce acceptable results (suggesting the problem space is evolving); measure the variance in outcomes across teams using the same tools (high variance suggests complexity that your standardised approach is not capturing). When these signals trigger, switch from execution mode to exploration mode.

5. From Correlation to Causation: Why Pattern-Matching Is Not Understanding

The most significant development in machine learning’s current trajectory; and the one that connects most directly to organisational learning theory; is the emergence of world models.

An LLM learns which words follow which other words. A world model attempts to learn why. Where an LLM predicts what text is likely, a world model predicts what will happen if a particular action is taken in a particular state. The difference is the difference between correlation and causation.

An example makes this concrete. An LLM trained on financial news can predict what a financial analyst might say about a market downturn, because it has seen many examples of such commentary. It cannot predict what the market will actually do, because it has never learned the causal dynamics. A world model would attempt to learn the underlying mechanisms: how interest rate changes affect borrowing, how borrowing affects investment, how investment affects employment. The LLM can talk about economics. A world model would try to do economics.

Yann LeCun formalised this in his 2022 paper “A Path Towards Autonomous Machine Intelligence,” proposing JEPA (Joint Embedding Predictive Architecture). The key distinction is between predicting raw outputs and predicting abstract representations of future states. Imagine the difference between predicting exactly what a photograph of a bouncing ball will look like at every pixel versus predicting the ball’s trajectory. The pixel prediction requires processing enormous irrelevant detail. The trajectory prediction captures the physics while discarding sensory noise.

In Giddens’s terms, language models operate entirely within discursive consciousness; the domain of articulated knowledge, of things that can be said. The real world operates primarily in practical consciousness; embodied, enacted, causally connected experience. The experienced nurse who “just knows” when a patient’s condition is deteriorating operates in practical consciousness: they cannot always articulate what they know, but they know it reliably because they have learned causal dynamics, not just textual descriptions.

Meta’s V-JEPA systems (Assran et al., 2024, 2025) demonstrated that this approach produces world models capable of physical reasoning from video, including robotic tasks. Production systems like NVIDIA’s Cosmos, DeepMind’s Genie 3, and Wayve’s GAIA-2 are building real applications on world model architectures. These systems learn how the physical world actually works, not just how text about it is typically structured.

The organisational parallel: An organisation operating through single-loop learning responds to events by adjusting behaviour within existing assumptions; exactly as an LLM responds to prompts by pattern-matching. An organisation operating through double-loop learning questions the assumptions themselves; building an internal model of how the world actually works, not just what patterns have been observed. This is Bateson’s progression again: from Learning I (response within a frame) to Learning II (examining the frame) to, in the rarest cases, Learning III (changing the kind of system you are). The world model is the machine learning attempt to reach Learning II.

This is what Senge meant by systems thinking: seeing underlying structure rather than reacting to events. His system archetypes; Shifting the Burden, Limits to Growth, Fixes that Fail; are precisely the causal models that world models attempt to learn: recurring structural patterns that produce predictable dynamics regardless of specific content. The leader who can see the archetype is operating with a world model. The leader who can only see the symptoms is operating as an LLM.

The ML limitation that org leaders should recognise: Current world models suffer from compounding error over long horizons. Each prediction step introduces a small error. The next prediction is based on the previous (slightly wrong) prediction, introducing another error. After enough steps, accumulated errors make the projection worthless. This is the planning fallacy expressed computationally. Strategic plans compound their errors with each step until the twenty-four-month roadmap bears no relationship to reality. Stacey would recognise the mechanism: long-range prediction in complex systems is not difficult because we lack good models. It is impossible because the systems are inherently unpredictable beyond a certain horizon.

More fundamentally, world models struggle with distinguishing genuine causal mechanisms from spurious correlations. A model that learns that wet streets and open umbrellas co-occur has learned a correlation, not the causal mechanism (rain) that produces both. Distributing umbrellas to dry streets will not make the streets wet. This is precisely the challenge that Stacey’s framework addresses: the difference between zones close to certainty (where causal analysis reveals mechanisms) and zones far from it (where apparent patterns may be emergent and unreliable).

The ML strategy that org leaders should adopt: Judea Pearl’s causal ladder maps directly onto the progression from single-loop to double-loop learning. Association is pattern-matching: what happened, and what tends to co-occur. Intervention is experimentation: what happens when I act. Counterfactual is reflective practice: what would have happened if I had acted differently.

Most organisational analysis operates at the level of association: “teams that adopted this tool showed higher productivity.” But correlation is not causation. Was it the tool, or was it that the teams willing to adopt were already higher-performing? Moving to intervention; running controlled experiments where some teams adopt and others do not; gets closer to causation. Moving to counterfactual; “what would have happened to this team if they had not adopted the tool?”; is the most powerful form of learning, and the hardest to implement.

The design principle: build your transformation around experiments, not rollouts. Each intervention should be designed to generate causal evidence, not just correlational data. Do not ask “did teams that adopted AI improve?” Ask “what happened to matched teams where one adopted and one did not?” This is experimentation with the rigour of causal inference. It is harder, slower, and far more reliable than the pattern-matching that passes for strategy in most organisations.

6. Safe to Fail: Why Exploration Requires Safety

Amy Edmondson’s research on psychological safety identifies the environmental condition without which learning cannot occur: people must feel safe to take interpersonal risks; to admit ignorance, ask questions, report errors, and challenge established practices; without fear of punishment. Without safety, people default to defensive routines. They perform what they already know rather than attempting what they need to learn.

The machine learning parallel requires understanding a fundamental dilemma in any learning system.

Every learning system faces a choice between exploitation and exploration. Exploitation means using what you already know: the restaurant you always go to, the coding pattern that always works, the strategy that has delivered results for five years. Exploration means trying something that might be better but might also be worse: the unfamiliar restaurant, the new pattern, the untested approach. An agent that always exploits will never discover better options. An agent that always explores will never accumulate consistent performance. Optimal learning requires balancing both.

In reinforcement learning; the branch of machine learning where agents learn by interacting with an environment and receiving rewards; this tradeoff is formalised mathematically. At each step, the agent decides whether to exploit (choose the action with the highest expected reward) or explore (choose an uncertain action to learn). The optimal balance depends on how much time the agent has, how much the environment might change, and how costly mistakes are.

The connection to Edmondson is structural. In an organisation without psychological safety, exploration is suppressed. People exploit; they perform using existing knowledge; because the cost of exploration (visible incompetence, social risk, failure) exceeds its perceived benefits. The result is the organisational equivalent of a pure-exploitation agent: competent, consistent, and permanently unable to improve.

Ericsson’s research on deliberate practice deepens the parallel. Deliberate practice requires performing at the edge of competence, where failure is frequent and feedback specific. This is precisely the zone of exploration that produces improvement. But in an unsafe environment, nobody practises at the edge; they practise at the centre, where performance is assured. The result is automaticity: fast, fluent, permanently stuck.

The machine learning equivalent is a model fine-tuned into a local optimum. Think of a landscape of hills and valleys where height represents performance. The model has climbed to the top of a nearby hill and stopped, because every step away leads downward. But there is a much taller mountain across the valley. To reach it, the model would need to accept worse performance temporarily; walking downhill before climbing up. Without an environment where temporary worse performance is tolerated, the model is trapped on its local hilltop.

This is what happens to the organisation that punishes failure: it converges on the nearest acceptable solution rather than the best one, because finding the best solution requires passing through territory where things get worse before they get better.

The ML results that should transform org design: DeepMind’s AlphaZero, playing chess against itself with no human examples, rediscovered centuries of chess theory and then moved beyond it, finding moves that grandmasters had never considered. This was possible only because the system could explore freely: it could play terrible chess for millions of games on the way to discovering extraordinary chess.

DeepSeek-R1 achieved something similar. Trained with Group Relative Policy Optimization (GRPO), which evaluates actions relative to each other rather than against an absolute baseline, the model was effectively given permission to explore extensively. The training environment was structured so temporary poor performance was decoupled from eventual reward. The result was emergent self-reflection and reasoning capabilities comparable to far more engineered systems.

The parallel to Edmondson is precise: the learning environment was structured to make exploration safe. Not safe in the sense of no consequences, but safe in the sense that temporary poor performance on the way to better understanding was not punished.

The design principle for org leaders: Stacey’s framework suggests that the zone far from certainty requires a different unit of action: not the plan but the coherent experiment. The unit of exploration is not the random action (pure exploration) but the bounded probe: an action designed to generate information, limited in downside risk, and instrumented to detect whether it is working. This is precisely the design of modern RL exploration strategies: structured exploration, bounded by safety constraints, with rapid feedback.

The practical implication: allocate a defined portion of your transformation budget to experiments that have no predetermined outcome. Not pilots (which have predetermined success criteria and are really exploitation). Genuine experiments: “we do not know what will happen, but we will learn something valuable regardless.” The DeepSeek-R1 result shows that given the right feedback structure, systems can develop capabilities that no amount of top-down design would have produced. But only if the environment permits the exploration. The organisation that requires every initiative to have a business case and projected ROI before it begins has structurally eliminated exploration. It will converge on its local hilltop and remain there.

7. Collective Intelligence: Why the Unit of Learning Is the Interaction

Stacey’s most provocative claim is that organisations are not things that can be designed and managed, but patterns of interaction that emerge from conversations between people. Transformation happens; or fails to happen; in the quality of interaction, not the quality of the plan.

Machine learning is arriving at the same conclusion from a different direction. And the collective intelligence research that grounds this article’s companion essay gives us the theoretical framework to explain why.

Woolley et al. (2010) found that groups of humans exhibit a measurable general collective intelligence factor; a “c factor” analogous to the individual g factor in psychometrics. The c factor was not predicted by the average or maximum intelligence of the group’s members. It was predicted by the average social sensitivity of members, the equality of conversational turn-taking, and the proportion of women in the group. The quality of the components did not determine the intelligence of the collective. The quality of the interactions did.

A multi-agent system uses multiple AI models working together rather than a single model alone. Du et al. (2023) demonstrated that multi-agent debate; where multiple models independently generate responses, read each other’s outputs, and regenerate through several rounds; significantly improved both reasoning and accuracy compared to single models. The models did not need separate training. The improvement came from the structure of interaction: the requirement to generate, encounter disagreement, and reconcile.

Here is how it works in practice. Three copies of the same model are asked: “Is the following claim true?” Each generates an independent answer with reasoning. Then each reads the other two responses and reconsiders. Through several rounds, errors get caught, weak reasoning gets challenged, and the collective answer converges toward accuracy. No individual model became smarter. The interaction produced intelligence that no individual component possessed.

This is the computational equivalent of Senge’s team learning and the empirical confirmation of what Falandays et al. describe as the abstract requirements for collective intelligence: agents, interaction mechanisms, and self-organisation toward adaptive behaviour. The Tool-MAD framework (2025) achieved a 35.5% improvement over single-model baselines on fact-verification. Khan et al. (2024) showed that debate with a more persuasive opponent led to more truthful answers, suggesting that the quality of challenge matters as much as its presence.

The organisational parallel is exact. The quality of collective output depends not on individual capability but on the quality of the interaction protocol; exactly as Stacey, Senge, and the Woolley results would predict. Research on AI safety via debate formalises this: two agents arguing opposing positions, with a human judge evaluating arguments, produces more reliable outputs than either agent alone. This is Argyris’s Model II behaviour implemented computationally: valid information, free choice, and genuine challenge.

Weick’s concept of heedful interrelating; the quality of attention in high-reliability organisations; describes what differentiates effective multi-agent systems from mere averaging. It is not enough to aggregate outputs. The agents must attend to each other’s contributions, integrate diverse perspectives, and maintain shared awareness. Research on cognitive architectures for language agents (Park et al., 2023) attempts to build this computationally; with memory, reflection, and adaptive coordination.

But the deepest parallel is in the limitation. Stacey insists that emergent patterns cannot be designed from outside; they can only be influenced through participation. Current multi-agent systems face the same challenge: over-specified protocols lose the generative quality that makes interaction valuable; under-specified protocols degenerate into incoherence. The design challenge; creating conditions for productive emergence without controlling it; is identical in both domains.

The ML strategy that org leaders should adopt: Structure your transformation around debate, not consensus. The multi-agent debate results show that structured disagreement produces better outcomes than individual expertise. This means: do not ask your best architect to design the AI strategy alone. Have three teams design it independently, then have each team critique the others’ designs, then iterate. The process is slower. The result is more reliable. The improvement comes not from smarter individuals but from the interaction protocol.

The practical design: every significant decision in the transformation programme should be subjected to a structured red-team exercise before implementation. Not a stakeholder review (which optimises for approval). A genuine adversarial challenge (which optimises for robustness). The Khan et al. result; that a more persuasive challenger produces more truthful outcomes; suggests that the quality of the challenge matters. Do not assign your most junior people to the red team. Assign your most capable critics.

8. The Specification Problem: Where Everything Converges

The Organisational Prompts series argues that the central challenge of AI transformation is specification: the ability to articulate domain knowledge with sufficient precision for AI to act on it. Drucker identified this as the defining challenge of knowledge work: the knowledge worker must define the task before they can do it. In AI-augmented work, the specification is the task.

This has a direct computational interpretation. The quality of an LLM’s output is bounded by the quality of its prompt. The quality of a fine-tuned model is bounded by the quality of its training data. The quality of a world model is bounded by the quality of its training environment. In every case, the constraint is specification.

Consider chain-of-thought prompting (Wei et al., 2022b): adding “let’s think step by step” to a prompt dramatically improves reasoning performance. The model’s capabilities did not change. The specification changed. By asking the model to show its working, the prompt created structure that enabled better reasoning. Tree of Thoughts (Yao et al., 2023) goes further: the model generates multiple reasoning paths, evaluates them, and backtracks from dead ends. The improvement is entirely in the specification of the reasoning process, not in the model’s underlying capabilities.

Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) addresses another aspect. RAG combines the model’s learned knowledge with retrieval from external documents. When asked a question, the system first searches a database for relevant information, then generates its answer using both internal knowledge and retrieved documents. The organisational parallel: the leader who combines experience with consultation of specific documents, data, and domain experts produces better decisions than the leader who relies on experience alone.

But the specification problem is also where reward hacking connects back. The Skalse et al. (2022) formal result; that any reward function is necessarily an incomplete specification; means that the specification gap is not a failure of skill. It is inherent. No matter how carefully you specify what you want, a sufficiently capable optimiser will find the gap. This is what every organisational theorist from Argyris to Stacey has observed about targets: the specification of what you want is always incomplete, and the more pressure you apply, the more creative the system becomes at satisfying the letter while violating the spirit.

Here is the finding that connects everything in this article: the specification problem cannot be solved computationally alone. It requires exactly the kind of organisational learning that this series documents; the surfacing of mental models (Senge), the double-loop questioning of assumptions (Argyris), the sensemaking that imposes order on ambiguity (Weick), the psychological safety that enables honest reporting (Edmondson), and the deliberate practice that builds genuine expertise rather than fluent automaticity (Ericsson).

The specification is the interface between human knowledge and machine capability. Its quality depends on the organisation’s ability to learn. An organisation that cannot learn cannot specify. And an organisation that cannot specify cannot use AI effectively, regardless of how capable the AI becomes.

9. The Translation Table: ML Strategies for Organisational Learning

The argument of this article rests on a philosophical claim established in its companion essays: LLMs and organisations are not merely analogous. They are collective intelligences that sit at different points on the same continuum, subject to the same structural constraints, exhibiting the same failure modes. Chollet’s formal separation of skill from intelligence applies to both. Levin’s cognitive light cone applies to both. Bateson’s levels of learning describe the same progression in both. The translation table that follows is not a set of metaphors. It is a map between two instances of the same dynamics, operating in different substrates.

The central insight is that machine learning has formalised problems that organisational theory describes qualitatively. Latent representations formalise mental models. Reward hacking formalises defensive routines. Distribution shift formalises the transition between Stacey’s zones of certainty and complexity. The exploration-exploitation tradeoff formalises the conditions for learning. These formalisations do not replace the organisational theories. They sharpen them; making them testable, measurable, and actionable.

But the traffic runs both ways. Argyris described single-loop and double-loop learning decades before anyone built a system that could exhibit both. Weick described sensemaking before anyone built a model that could do in-context learning. Edmondson described psychological safety before anyone formalised the exploration-exploitation tradeoff. Illich distinguished convivial from manipulative institutions before anyone asked whether AI systems amplify or replace human intelligence. The organisational theorists got there first. They saw the dynamics in their substrate. Machine learning is rediscovering them in a different substrate, with mathematical precision and the disadvantage of thinking it is seeing something new.

What both fields need, and neither yet has, is a unified theory of hybrid learning systems; one that explains how human organisations and machine learning systems can learn together. A theory that addresses how human sensemaking and machine pattern-recognition complement and constrain each other. How organisational structures enable or prevent the feedback loops that machine learning requires. How the psychological conditions for human learning (safety, autonomy, mastery) interact with the computational conditions for machine learning (data quality, reward design, exploration). How; in Levin’s terms; the cognitive light cones of the human collective and the computational collective can be aligned rather than allowed to interfere.

This theory does not yet exist. But the raw materials are present in both rooms. The question is whether the people in each room will notice that they are working on the same problem, and whether they will have the intellectual humility and willingness to learn from an unfamiliar discipline that their own theories say is necessary for genuine learning.

That, of course, is a double-loop question. And it is the hardest kind to answer.

Further Reading

Chris Argyris, Donald Schön: Organizational Learning II: Theory, Method, and Practice (1996). The foundational work on single-loop and double-loop learning, defensive routines, and the gap between espoused theory and theory-in-use. Read it alongside any paper on RLHF and reward hacking; the structural parallels are exact.

Karl Weick: Sensemaking in Organizations (1995). The theory of how people impose order on ambiguity. The enactment cycle; select cues, fit to frame, act on interpretation; is the organisational equivalent of in-context learning.

Gregory Bateson: Steps to an Ecology of Mind (1972). The levels of learning and the insistence that mind is a property of the system, not the individual. Read it alongside Levin’s TAME framework and Falandays et al. on collective intelligence; Bateson saw the pattern fifty years before the biology and the computation caught up.

Ralph Stacey: Strategic Management and Organisational Dynamics (5th edition, 2007). The agreement-certainty matrix, complex responsive processes, and the argument that organisations are patterns of interaction, not things that can be designed. The distinction between zones close to and far from certainty maps directly to the in-distribution/out-of-distribution boundary in machine learning.

Peter Senge: The Fifth Discipline (revised edition, 2006). Systems thinking, mental models, team learning. Read the systems archetypes alongside feedback loops in ML training; the same dynamics that make organisations resist learning make models converge on suboptimal solutions.

Amy Edmondson: The Fearless Organization (2018). Psychological safety as the precondition for learning. Read it alongside the exploration-exploitation literature in reinforcement learning; both address the same question: what conditions enable a learning system to explore beyond its current knowledge?

K. Anders Ericsson, Robert Pool: Peak: Secrets from the New Science of Expertise (2016). Deliberate practice and the distinction between experience and expertise. The automaticity trap; performing fluently without improving; is the human equivalent of a model fine-tuned into a local optimum.

Elicit: Machine Learning Reading List. A curriculum for foundation models, from fundamentals to frontier research. The sections on world models, uncertainty, and reinforcement learning are most relevant to this article.

Technical References

Listed in order of appearance. arXiv links provided where available.

Falandays, J. B., et al. (2023). “All Intelligence is Collective Intelligence.” Journal of Multiscale Neuroscience 2(1), 169-191. PDF

Levin, M. (2022). “Technological Approach to Mind Everywhere.” Frontiers in Systems Neuroscience 16, 768201. PMC

Chollet, F. (2019). “On the Measure of Intelligence.” arXiv:1911.01547

Vaswani, A., et al. (2017). “Attention Is All You Need.” NeurIPS 2017. arXiv:1706.03762

Kaplan, J., et al. (2020). “Scaling Laws for Neural Language Models.” arXiv:2001.08361

Hoffmann, J., et al. (2022). “Training Compute-Optimal Large Language Models (Chinchilla).” NeurIPS 2022. arXiv:2203.15556

Wei, J., et al. (2022a). “Emergent Abilities of Large Language Models.” arXiv:2206.07682

Kalai, A.T., Vempala, S.S. (2024). “Calibrated Language Models Must Hallucinate.” STOC 2024. arXiv:2311.14648

Wen, Y., et al. (2024b). “Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning.” arXiv:2512.19920

Farquhar, S., et al. (2024). “Detecting Hallucinations in Large Language Models Using Semantic Entropy.” Nature 630, 625-630.

DeepSeek-AI (2025). “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv:2501.12948

Li, K., et al. (2023). “Emergent World Representations.” arXiv:2210.13382

Gurnee, W., Tegmark, M. (2024). “Language Models Represent Space and Time.” arXiv:2310.02207

Elhage, N., et al. (2022). “Toy Models of Superposition.” Anthropic. arXiv:2209.10652

Templeton, A., et al. (2024). “Scaling Monosemanticity.” Anthropic. transformer-circuits.pub

Berglund, L., et al. (2023). “The Reversal Curse.” arXiv:2309.12288

Skalse, J., et al. (2022). “Defining and Characterizing Reward Hacking.” NeurIPS 2022. arXiv:2209.13085

Christiano, P., et al. (2017). “Deep Reinforcement Learning from Human Feedback.” arXiv:1706.03741

Ouyang, L., et al. (2022). “Training Language Models to Follow Instructions with Human Feedback (InstructGPT).” NeurIPS 2022. arXiv:2203.02155

Rafailov, R., et al. (2023). “Direct Preference Optimization.” NeurIPS 2023. arXiv:2305.18290

Bai, Y., et al. (2022). “Constitutional AI.” Anthropic. arXiv:2212.08073

Krakovna, V., et al. (2020). “Specification Gaming: The Flip Side of AI Ingenuity.” DeepMind. deepmindsafetyresearch.medium.com

Greenblatt, R., et al. (2024). “Alignment Faking in Large Language Models.” Anthropic. arXiv:2412.14093

Anthropic (2025). “Natural Emergent Misalignment from Reward Hacking in Production RL.”

Wen, Y., et al. (2024a). “Language Models Learn to Mislead Humans via RLHF.” arXiv:2409.12822

Lightman, H., et al. (2023). “Let’s Verify Step by Step.” OpenAI. arXiv:2305.20050

LeCun, Y. (2022). “A Path Towards Autonomous Machine Intelligence.” Meta AI. openreview.net

Assran, M., et al. (2024). “V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video.” Meta AI. arXiv:2404.08471

Assran, M., et al. (2025). “V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning.” Meta AI.

Bálint, D., et al. (2025). “Extending Epistemic Uncertainty Beyond Parameters.” arXiv:2506.07448

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.

Wei, J., et al. (2022b). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” arXiv:2201.11903

Yao, S., et al. (2023). “Tree of Thoughts.” NeurIPS 2023. arXiv:2305.10601

Lewis, P., et al. (2020). “Retrieval-Augmented Generation.” NeurIPS 2020. arXiv:2005.11401

Du, Y., et al. (2023). “Improving Factuality and Reasoning through Multiagent Debate.” ICML 2024. arXiv:2305.14325

Khan, A., et al. (2024). “Debating with More Persuasive LLMs Leads to More Truthful Answers.” arXiv:2402.06782

Tool-MAD (2025). “Multi-Agent Debate Framework for Fact Verification.” arXiv:2601.04742

Park, J.S., et al. (2023). “Generative Agents: Interactive Simulacra of Human Behavior.” arXiv:2304.03442

Woolley, A., et al. (2010). “Evidence for a Collective Intelligence Factor in the Performance of Human Groups.” Science 330(6004), 686-688.

Halpin, H. (2025). “Artificial Intelligence versus Collective Intelligence.” AI and Society 40, 4589-4604. Springer

Schrittwieser, J., et al. (2020). “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero).” Nature 588, 604-609. arXiv:1911.08265

Silver, D., et al. (2018). “A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go (AlphaZero).” Science 362(6419). arXiv:1712.01815

Burns, C., et al. (2022). “Discovering Latent Knowledge in Language Models Without Supervision.” arXiv:2212.03827

Wang, X., et al. (2023). “Self-Consistency Improves Chain of Thought Reasoning.” arXiv:2203.11171

Domain Driven Design and the Boundary Imperative for AI

Justin Arbuckle — Mon, 27 Apr 2026 07:01:33 GMT

Your teams are busy. The kanban boards are moving. Code is being written, agents are being wrangled, and dashboards are being produced. And yet, every few months the same question surfaces in the steering committee: “Are we building the right things?”

The question is revealing. It is not asked because nothing has been delivered. Plenty has been delivered. It is asked because no one can explain, with structural confidence, why these things were built NOW rather than other things NOW, and how the pieces relate to each other, or where the boundaries lie between what one team owns and what another team owns. This is often experienced by teams working on multiple capabilities across an enterprise. How do you coordinate them and their consumption of shared resources. API’s are obviously part of the answer but as anyone who has designed API infrastructure knows, getting the boundaries or the grain right is the hard part.

This is the domain problem. Not the absence of work, but the absence of the structural clarity that tells you what work belongs where, what language applies in which context, and where the real boundaries of responsibility lie. It is the problem that sits between purpose (knowing why you are transforming) and specification (knowing what to build). You may have answered the purpose question. You may have adopted specification-driven development for instance. But if you have not decomposed your enterprise into well-defined domains with explicit boundaries and clear ownership, then every specification is an island, every team is guessing about its neighbours, and the AI-generated components your teams are producing will not fit together.

Eric Evans, a software designer who spent the 1990s building large business systems and watching them succeed or fail for reasons that had little to do with the technology, published Domain-Driven Design: Tackling Complexity in the Heart of Software in 2003. Its central argument is deceptively simple:

The most significant complexity in software is not technical. It is in the domain. And the organisations that manage domain complexity well build systems that endure, while those that do not build systems that calcify into legacy the moment the last developer who understood them leaves the building.

Evans’ work is rooted in object-oriented design and has spawned a vast community of practice with its own conferences, pattern extensions, and implementation debates. The tactical patterns (entities, value objects, aggregates, repositories) were important when humans wrote every line of code. In 2026, AI generates the implementation. The tactical patterns still matter, but they are increasingly the AI’s concern, not yours. What remains squarely yours is the strategic design: bounded contexts, context maps, ubiquitous language, and core domain distillation. These are the concepts that determine whether the applications your teams build with AI can talk to each other, compose into a coherent enterprise, and evolve without breaking their neighbours.

Evans himself has said that it would be a shame to do DDD as he wrote it in 2003; the ideas have evolved, and his current work focuses on integrating AI into domain-rich systems. This article addresses the strategic concepts most directly relevant to anyone leading AI transformation in a large enterprise today; and it addresses them from the perspective of the engineer and business analyst whose job is no longer to specify entities and write implementations, but to work with AI to define models, and to ensure that the different applications built by different teams communicate using standard semantics.

1. Knowledge Crunching With AI: The Three-Way Conversation

Evans begins not with architecture but with a learning process. He calls it knowledge crunching: the continuous, iterative collaboration between developers and domain experts to distil a torrent of chaotic information into a practical model. This is not requirements gathering. Requirements gathering assumes the knowledge exists in the domain expert’s head, fully formed, waiting to be extracted and transcribed. Knowledge crunching assumes the opposite: that the knowledge must be constructed through dialogue, experimentation, modelling, and the confrontation of assumptions that both sides did not know they held.

What has changed since 2003, and what changes the practice fundamentally, is that the AI is now a participant in the conversation. Knowledge crunching is no longer a multi-party dialogue between the developer and the domain experts. It is a three-way conversation: the domain expert brings the knowledge, the engineer or business analyst brings the structural thinking, and the AI brings the ability to generate, test, and refine models at a pace that was never previously possible.

Consider what this looks like concretely. A team is building an AI-assisted underwriting system for commercial insurance. The domain expert says: “We need to assess the risk of the applicant.” In the old world, the developer would propose a model on a whiteboard: an Applicant entity with a risk_score field computed by a function that takes financial data as input. The domain expert would frown, explain the complexity, and the developer would iterate manually, sketching, erasing, re-drawing.

In 2026, the engineer opens a conversation with the AI. They describe the domain expert’s initial statement. The AI generates a candidate model: perhaps an Applicant entity with a risk_score. The domain expert reacts: “It is not quite like that. We do not score the applicant. We score the risk. The same company might present different risks depending on the line of business: their property risk is different from their liability risk, which is different from their cyber risk. And the risk is not just about the company; it is about the specific exposures they are asking us to cover, the limits they want, their claims history on that line, and the aggregation with our existing book.”

The engineer feeds this correction back to the AI. Within seconds, the AI produces a revised model: a Submission containing multiple RiskAssessment objects, each scoped to a LineOfBusiness, each drawing on different data sources, each subject to different regulatory constraints. The domain expert examines the revised model, spots a further nuance: “The aggregation check is not per submission. It is per broker relationship, across all active policies. And the capacity check is at the syndicate level, not the company level.” The AI revises again. In twenty minutes, the three-way conversation has produced a model that, in the old world, would have taken three weeks of iteration between whiteboard sessions, incorrect implementations, and course corrections after review.

The engineer’s role in this conversation is not to specify the entities. The AI can do that.

The engineer’s role is to curate the model.

To recognise when the domain expert’s correction reveals a structural boundary, to challenge the AI’s output when it makes assumptions that do not hold in the domain, and to ensure that the resulting model is expressed in language that can be used consistently across conversations, specifications, and code. The engineer is not the implementer. The engineer is the quality controller of domain understanding.

This is Evans’ knowledge crunching process, accelerated. The constant refinement of the domain model still forces all participants to learn the important principles of the business. What has changed is the speed of iteration and the medium of expression. The model is not a whiteboard sketch that must be manually translated into code. It is a structured artefact, a JSON Schema, an OpenAPI fragment, a domain model expressed in a form the AI can immediately use to generate implementations, tests, and documentation. The knowledge crunching session produces the specification directly.

2. Ubiquitous Language: The Semantic Standard That Determines Whether Applications Compose

Evans’ most powerful concept is not an architectural pattern. It is a linguistic discipline. The ubiquitous language is a shared vocabulary, used consistently by developers and domain experts within a given context, that appears in conversation, documentation, and code. It is not a glossary imposed by a committee. It is the language that emerges from knowledge crunching and is refined through implementation. When a developer uses a term differently from a domain expert, the model is wrong. When the code uses different names from the conversation, the model has drifted from reality. When two teams use the same word to mean different things, a boundary has been crossed without anyone noticing.

In the old world, the ubiquitous language was a discipline that improved code quality and reduced the translation tax between business and technology.

In the AI-augmented world, Ubiquitous Language is something more fundamental: it is the semantic standard that determines whether different applications can communicate at all.

When five teams each build applications with AI assistance, and each application needs to exchange data with the others, the question is not whether the APIs are technically compatible. JSON over HTTPS is trivially interoperable at the transport level. The question is whether the meanings are compatible. Does “customer” in the onboarding application mean the same thing as “customer” in the payments application? Does “transaction” in the fraud detection system refer to the same concept as “transaction” in the regulatory reporting system? If the semantic standard has not been defined, each AI-generated application will embed its own interpretation of these terms, and the integrations between them will be built on ambiguity and drift in unpredictable ways. It is important to note here that the language between the boundaries is what matters. There may be very specialised terms inside a context that are undefined elsewhere and that is how it should be, but once we get to interacting between contexts, we all need to agree on what these boundary terms mean.

Over years, the codebase becomes a foreign language that only the original developers can translate (we’ve all been there!), and when they leave, the system becomes legacy not because the technology is outdated but because no one can map the code back to the business. AI doesn’t solve this problem. Just because the code can be parsed by the AI doesn’t mean it means what you think it means…

This matters with particular force in AI-augmented development because ambiguity in language produces ambiguity in specification, and ambiguity in specification produces unpredictable AI output across every application that consumes the specification. The problem multiplies. When one team writes a specification with an ambiguous term, one AI-generated application embeds the ambiguity. When five teams share a term without defining it consistently, five AI-generated applications each embed their own interpretation, and the integrations between them silently misalign.

Here is a concrete example. A retail bank’s payment API specification includes a field called account_holder. In the customer onboarding context, account_holder means the person who passed KYC checks and signed the account agreement: a legal identity with a verified address, a date of birth etc. In the payments context, account_holder means the name associated with the sort code and account number: a string that appears on the payee’s bank statement. In the fraud detection context, account_holder means a behavioural profile: a pattern of transaction times, amounts, locations, and device fingerprints.

If the specification for the payment API uses account_holder without defining which meaning applies, each AI-generated application that consumes or produces this field will make its own assumption. The onboarding application’s AI might validate against KYC records. The payments application’s AI might populate a free-text field. The fraud detection application’s AI might try to match against behavioural patterns. Each application works internally.

The fix is not better prompting. It is better language: different terms for different concepts, used consistently within each context. AccountHolder, PayeeName, BehaviouralProfile. Three words where there was one, and with them, three specifications that AI can implement unambiguously. The engineer’s job, working with the domain expert and the AI, is to ensure that this semantic precision exists before any application is generated. The ubiquitous language is not documentation. It is infrastructure. It is the semantic standard on which inter-application communication depends.

Let’s go back to the softer side of the theory discussed earlier in the series…Anthony Giddens would recognise the ubiquitous language as a structure of signification: an interpretive scheme that shapes how people understand the domain. It is reproduced in daily practice, every time a developer names a class, every time a domain expert uses the term in a meeting, every time a specification references it. Bourdieu would go further. The ubiquitous language, once internalised, becomes part of the team’s habitus: the accumulated dispositions that generate practice without conscious deliberation. And Bourdieu would also warn that the existing ubiquitous language, once habituated, becomes resistant to change. The team that has spent three years thinking in one domain model will resist a refactoring of that model at the level of embodied practice, not just intellectual preference. Their habitus will generate the old names, the old structures, the old assumptions, long after the decision to change has been made.

3. Bounded Contexts: Where Semantic Standards Get Their Scope

The ubiquitous language solves the clarity problem within a team. But a large enterprise contains many teams, many domains, and many legitimate perspectives on the same real-world phenomena. As discussed above, “Customer” means something different in sales, support, billing, compliance, and fraud detection, and it should mean something different, because each function engages with a different aspect of the customer relationship. The attempt to build a single unified model of “Customer” that works for every function is not a clarity exercise. It is a confusion engine, producing a bloated, compromise-ridden abstraction that serves nobody well and that every team must work around in practice.

Evans’ answer is the bounded context: a linguistic and model boundary within which a particular domain model applies consistently. Inside a bounded context, all terms have specific, unambiguous meanings. Outside it, different terms, or the same terms with different meanings, apply.

The bounded context is not a microservice. It is not a database schema. It is not a team. It is a commitment to coherence within a defined scope, and a simultaneous acceptance that coherence across the entire enterprise is neither achievable nor desirable.

For engineers and business analysts working with AI, the bounded context answers a question that becomes urgent the moment multiple teams are generating applications: what is the scope of our semantic standard? The ubiquitous language cannot be enterprise-wide, because the enterprise legitimately uses the same words to mean different things in different functions. But it must be rigorously consistent within the boundary of a context, because that is the scope within which AI-generated applications must compose. The bounded context defines where your semantic standard applies, where it ends, and where a different standard begins.

Consider the retail bank again, now at the level of system architecture. The enterprise contains at least these bounded contexts, each with its own semantic standard:

Customer Onboarding. The model here centres on the applicant who becomes an account holder through a series of verification steps: identity check, address verification, sanctions screening, credit assessment, regulatory approval. The ubiquitous language includes terms like KYCStatus, VerificationLevel, RegulatoryApproval, and OnboardingDecision. Any application generated by AI within this context uses these terms consistently. The specification defines APIs for submitting applications, checking verification status, and retrieving onboarding decisions, all expressed in the onboarding language.

Payments. The model centres on the payment: an instruction to move money from one account to another. The ubiquitous language includes Payee, SortCode, AccountNumber, PaymentReference, SettlementDate, and PaymentStatus. .

Fraud Detection. The model centres on the transaction as a behavioural event: something to be analysed for anomalies against a behavioural profile. The language includes RiskScore, AlertThreshold, PatternMatch, DeviceFingerprint, and VelocityCheck. A “customer” here is a pattern of behaviour, not a person with an address. The fraud detection context needs to see transaction data, but it must never modify it. It produces alerts; it does not block payments. The decision to block is made in the payments context, which may or may not act on the fraud context’s assessment.

The enterprise does not achieve clarity by building one model. It achieves clarity by building many models, each coherent within its boundary, and managing the interfaces between them. The engineer’s job is not anymore to build these models manually. It is to ensure that the boundaries are defined, the language within each boundary is rigorous, and the contracts between boundaries are explicit. The AI generates the applications. The humans govern the semantics.

4. Context Maps and Standard Semantics: How Boundaries Talk to Each Other

Bounded contexts do not exist in isolation. The fraud detection context needs transaction data from the payments context. The regulatory reporting context needs data from both payments and onboarding. The onboarding context needs to know whether the payment system supports the account type being opened. The question is not whether contexts communicate, but how, and who controls the terms of the communication.

This is where the engineer’s role shifts from model curation to semantic governance. Within a bounded context, the engineer works with the domain expert and the AI to define the model and its language. Between bounded contexts, the engineer’s job is to define the translation rules: the contracts that determine how one context’s language maps to another’s, and the standards by which that mapping is expressed.

Evans defines a context map: a visual and descriptive overview of all the bounded contexts in a system and the relationships between them. The context map is not an architecture diagram in the traditional sense. It is a map of integration relationships, and those relationships are as much about semantics as they are about technology.

Evans provides a vocabulary for these relationships that makes the power dynamics and semantic commitments explicit.

Partnership. Two teams with a mutual dependency coordinate their plans and development. Both sides negotiate the integration contract and the shared semantics. This works when the teams have roughly equal power and shared objectives. In practice, it is the rarest pattern, because genuine partnership requires investment from both sides that neither governance framework nor budget allocation typically supports.
Customer-Supplier. One team (the supplier, or “upstream”) provides a service that another team (the customer, or “downstream”) depends on. The downstream team can make requests; the upstream team decides whether and when to fulfil them. The semantic implication is that the upstream team defines the language of the contract. In a healthy relationship, the upstream team considers the downstream team’s semantic needs. In an unhealthy one, the upstream team ships whatever model suits its own domain, and the downstream team copes.
Conformist. The downstream team has no influence over the upstream team’s model and simply conforms to whatever the upstream provides. There is no translation, no negotiation, and no accommodation. The downstream team adopts the upstream team’s language and model, even when it does not fit their domain. This is common when integrating with third-party platforms, legacy systems, or internal platform teams that do not recognise the downstream team’s existence. In AI terms, the downstream team’s AI-generated applications must use the upstream’s terminology, even when it conflicts with the downstream’s own domain language.
Anticorruption Layer. The downstream team builds a translation layer that converts the upstream model into its own model. The upstream system’s language and structures never leak into the downstream domain. This is the most expensive integration pattern and also the most protective: it preserves the semantic integrity of the downstream model at the cost of building and maintaining a translator. When two AI-generated applications need to communicate and their semantic models are fundamentally different, the anticorruption layer is where the translation happens. The engineer defines the mapping; the AI can generate the implementation.
Open Host Service / Published Language. The upstream team defines a well-documented protocol, the published language, that any downstream team can build against. This is the pattern behind public APIs, standardised event schemas, and shared specification artefacts. It is the pattern that specification-driven development naturally produces: an OpenAPI specification, an AsyncAPI specification, or a JSON Schema that defines the contract in a machine-readable form that any consumer, human or AI, can build against. The published language is the standard semantic contract.
Separate Ways. Two contexts have no integration at all. They operate independently. This is the right choice when the cost of integration exceeds its value, and it is a choice that most architecture teams are reluctant to make because it looks like a gap in the design rather than a deliberate decision.

The context map reveals things that architecture diagrams do not. It reveals which teams are in conformist relationships they never chose, silently absorbing the semantic assumptions of upstream systems they cannot influence. It reveals which integrations are missing anticorruption layers, allowing one domain’s language to pollute another’s. It reveals where the published language is actually just an undocumented API that happens to work today and will break without warning tomorrow.

For engineers managing AI-generated applications, the context map is the governance tool for inter-application semantics. It answers: which applications must agree on terminology? Where are the translation layers? Where must we invest in explicit contracts, and where can we accept semantic coupling? Without the map, each team generates applications that work internally but cannot compose, because nobody has defined how the languages translate across boundaries.

5. Discovering Domains: How to Find the Boundaries You Cannot Yet See

Evans provides the concepts: bounded contexts, ubiquitous language, context maps. What he does not provide, at least in the original 2003 text, is a structured workshop method for discovering where the boundaries actually lie. Knowledge crunching is iterative and ongoing, but how do you get started? How do you walk into a large enterprise with a monolithic system and a decade of accumulated technical debt and identify the bounded contexts that should have existed all along?

The DDD community has developed several complementary techniques for this. Each works differently, and each reveals different things.

EventStorming. Alberto Brandolini’s workshop method, developed from 2013 onwards, has become the most widely used domain discovery technique in the DDD community. The approach is deliberately low-tech: a long wall, unlimited orange sticky notes, and a room full of people who understand different parts of the business. The rule is simple: write domain events on the orange stickies, expressed as verbs in the past tense (”Payment Submitted,” “Account Opened,” “Fraud Alert Raised,” “Regulatory Report Filed”), and place them on a timeline from left to right.

Domain Storytelling. Where EventStorming begins with events and works outward, Domain Storytelling begins with people and their interactions. Developed by Stefan Hofer and Henning Schwentner, the technique asks domain experts to tell concrete stories about how work gets done: “The underwriter receives a submission from the broker, reviews the risk assessment for each line of business, checks the aggregation against our existing book, and either issues a quote or refers the submission to the senior underwriter for review.” The facilitator captures each story as a pictographic diagram: actors, work objects, and the activities that connect them, numbered in sequence. The resulting diagram is a visual narrative of a single concrete workflow. It is particularly effective with domain experts who are uncomfortable with the energy and ambiguity of a big picture workshop, and it produces clear, readable process descriptions that domain experts can validate immediately.

Wardley Mapping. Simon Wardley’s strategic mapping technique operates at a different level of abstraction. A Wardley Map visualises the components in a value chain, positioning each on an evolution axis from genesis (novel, uncertain, changing rapidly) through custom-built and product to commodity (standardised, well-understood, stable). Wardley Mapping connects directly to Evans’ Core Domain distillation: components in the genesis and custom-built stages are candidates for deep investment in domain modelling and specification; components in the commodity stage are generic subdomains where off-the-shelf solutions apply.

6. Core Domain Distillation: Where to Invest Your Semantic Precision

Not every bounded context deserves the same investment in semantic rigour. Evans introduces Core Domain distillation: the discipline of identifying which part of the enterprise model provides competitive advantage, and concentrating the best modelling effort there.

The Core Domain is the thing that makes this organisation different from its competitors.

Everything else falls into two categories: supporting subdomains (necessary for the core but not distinctive) and generic subdomains (the same in every company and therefore candidates for off-the-shelf solutions).

For engineers and business analysts working with AI, distillation answers the investment question that most transformation programmes never ask: where should we invest in deep domain modelling, rigorous specification, and carefully curated semantic standards, and where can we accept generic AI-generated applications with minimal domain-specific governance?

A retail bank’s Core Domain might be its credit risk modelling, its customer relationship management, or its real-time fraud detection, depending on where its competitive advantage actually lies. Its email system, its document generation, and its authentication mechanism are generic subdomains. They need to work. They do not need bespoke semantic standards. Its regulatory reporting is a supporting subdomain: essential for operating legally, specific to the banking industry, but not the thing that wins customers.

The Core Domain deserves the deepest three-way knowledge crunching conversations: domain experts, engineers, and AI iterating intensively to produce models and specifications that capture genuine competitive advantage. This is where the ubiquitous language must be most precise, the semantic standards most rigorous, and the published language contracts most carefully governed. Generic subdomains should use generic AI tools with generic specifications. The common mistake, which Evans’ framework exposes with painful clarity, is investing senior engineers in perfecting the semantic standard for a document management system while leaving the credit risk domain to junior analysts and generic prompts. The technology is impressive. The investment logic is incoherent.

Distillation also provides the sequencing logic. You do not transform the entire enterprise simultaneously. You identify the Core Domain, invest in knowledge crunching and specification development there first, and let the supporting and generic subdomains follow with less intensive approaches. The Core Domain is where you learn what AI-augmented development actually requires in your specific context. The lessons from that investment propagate outward to less critical domains.

7. Published Languages at Scale: How MCP, A2A, and Open Standards Change the Boundary Problem

Evans’ Open Host Service / Published Language pattern describes the ideal: a bounded context exposes a well-documented, stable protocol that any other context can build against. In 2003, when Evans wrote, the tooling for this pattern was limited. Today, a convergence of open specification standards and agent interoperability protocols is making the Published Language pattern not just achievable but, increasingly, the default architecture for AI-augmented systems.

This is where the engineer’s role as semantic governor becomes most concrete. The specification standards are the mechanisms by which semantic standards are published, and the interoperability protocols are the mechanisms by which AI-generated applications discover and consume those standards.

The specification standards are already well established. OpenAPI defines REST API contracts in a machine-readable format: endpoints, request and response schemas, validation rules, authentication requirements, and error responses. AsyncAPI extends the same principle to event-driven architectures: message schemas, channel definitions, protocol bindings for Kafka, AMQP, WebSocket, and others. JSON Schema provides the vocabulary for validating data structures, and underpins both OpenAPI and AsyncAPI. These are the published languages of the current software ecosystem, and they map directly to the bounded context boundary. Each context publishes its API contract as an OpenAPI or AsyncAPI specification. Downstream contexts build against the specification, not the implementation. The specification is the boundary: it defines what the context exposes, what it accepts, what it guarantees, and what it does not. For engineers working with AI, the specification is also the input: it is what the AI reads to generate both implementations and integrations.

What is new, and what changes the domain problem fundamentally, is the emergence of protocols designed for AI agent interoperability: mechanisms by which AI-powered applications in different bounded contexts can discover, negotiate, and communicate with each other.

Model Context Protocol (MCP). Released by Anthropic in 2024 as an open standard, MCP provides a standardised way for AI applications to connect with external tools, databases, and services. In DDD terms, MCP is the protocol that allows an AI-generated application to interact with the internal resources of its bounded context: the databases, the services, the validation tools, the domain-specific functions. It standardises what was previously custom glue code between the model and its environment. An AI agent operating within the payments bounded context uses MCP to access the payments database, invoke the payments validation service, and query the payments reference data; all through a standardised interface expressed as JSON Schema tool definitions. The engineer’s job is to define those tool definitions in the ubiquitous language of the payments context, so that the AI agent speaks the domain’s language when interacting with the domain’s resources.

Agent-to-Agent Protocol (A2A). Announced by Google in April 2025 with backing from over fifty technology partners and now an open-source Linux Foundation project, A2A addresses a different problem: how AI-generated applications in different bounded contexts discover each other and collaborate. Where MCP standardises the relationship between an agent and its tools, A2A standardises the relationship between agents. Each agent publishes an Agent Card: a JSON metadata document that describes its capabilities, its supported input and output modalities, its authentication requirements, and its service endpoint. Other agents use the Agent Card to discover what a remote agent can do and how to interact with it. A2A supports structured task lifecycle management (creation, progress updates, completion, failure), real-time streaming via server-sent events, asynchronous push notifications for long-running tasks, and multimodal data exchange.

The DDD parallel is exact. The Agent Card is a machine-readable Published Language: it tells consuming agents what this context’s application can do, what it expects, and what it returns. The task lifecycle is a formalised integration contract. The authentication and authorisation mechanisms enforce boundary integrity. A2A treats agents as opaque: the consuming agent does not need to know the internal architecture, the model, or the tools of the remote agent. It only needs to know the published interface. This is precisely Evans’ bounded context principle applied to AI systems: strong internal coherence, loose external coupling, and explicit boundary contracts.

What this means for the engineer and business analyst is significant. Evans’ context mapping patterns, partnership, customer-supplier, conformist, anticorruption layer, published language, separate ways, were originally described in terms of human teams and their codebases.

The agent interoperability protocols provide the technical substrate for implementing these patterns between AI-generated applications. A customer-supplier relationship between two AI agents is mediated by A2A: the upstream agent publishes an Agent Card; the downstream agent discovers it, sends tasks, and receives results. An anticorruption layer between an AI agent and a legacy system is mediated by MCP: the agent accesses the legacy data through a standardised connector that translates the legacy model into the agent’s domain model. The published language pattern is no longer just an OpenAPI specification served by a gateway. It is an Agent Card, an OpenAPI specification, an AsyncAPI event schema, and an MCP server configuration, all expressing the same bounded context boundary in formats that both humans and machines can consume.

The engineer’s job is to ensure that these artefacts are semantically consistent: that the Agent Card describes capabilities in the same ubiquitous language as the OpenAPI specification, the AsyncAPI event schemas, the MCP tool definitions, and the domain model that the knowledge crunching sessions produced. If the Agent Card uses one set of terms and the OpenAPI specification uses another, the boundary contract is incoherent regardless of how well each artefact works in isolation. Standard semantics means semantic consistency across all the artefacts that define a bounded context’s boundary. The protocols provide the transport. The ubiquitous language provides the meaning.

8. Evans and AI: The Engineer as Semantic Architect

At the Explore DDD conference in March 2024, Evans addressed the AI question directly. His insight was characteristically structural: a trained language model is a bounded context.

The argument is this. A generic LLM, trained on the broad corpus of human language, is a general-purpose tool. It can generate plausible text about anything, but it has no deep understanding of any particular domain. When you prompt it with domain-specific questions, you must construct careful, elaborate prompts to compensate for its lack of domain knowledge. The output is plausible but shallow.

A language model operating within a well-defined bounded context, grounded in the ubiquitous language through system prompts, specification artefacts, and domain-specific MCP tool definitions, is a different thing. It responds naturally to domain terms. It generates output that reflects the model, not just statistical patterns in general language. It does not need elaborate prompt scaffolding because it already speaks the domain’s language.

Evans’ decomposition principle applies: instead of one large, general-purpose AI generating applications across the enterprise, you should have several domain-specific AI configurations, each aligned to a bounded context, each grounded in that context’s ubiquitous language, each producing output that is coherent within its domain’s model. This is separation of concerns applied to AI architecture.

The practical architecture looks like this. The payments bounded context has its own AI configuration: its own system prompts grounded in the payments ubiquitous language, its own specification artefacts (OpenAPI schemas, JSON Schema definitions, validation rules) that constrain generation, and its own test suites that validate output against the payments domain model. The fraud detection context has a different configuration: different prompts, different schemas, different validation. They communicate through the same integration patterns (anticorruption layers, published languages, customer-supplier contracts) that govern their non-AI interactions; now formalised through protocols like MCP and A2A.

The typical policy files for security, compliance etc may be shared across these environments. They themselves may be defined as a Published Language pattern as Evans describes.

This redefines the engineer’s role. The engineer is no longer the person who writes the implementation. The AI does that. The engineer is the person who ensures that the domain model is correct, the ubiquitous language is precise, the specifications are rigorous, the bounded context boundaries are clear, and the integration contracts between contexts use standard semantics. The engineer is, in Evans’ terms, the curator of the model. In architectural terms, the engineer is the semantic architect: the person responsible for ensuring that the enterprise’s AI-generated applications compose into a coherent whole because they are built on shared semantic foundations.

The business analyst’s role shifts in parallel. The analyst is no longer the person who writes requirements documents for developers to implement. The analyst is the person who sits in the knowledge crunching session with the domain expert and the AI, ensuring that the model captures the business reality, that the specifications reflect genuine domain understanding, and that the language used in one context does not silently conflict with the language used in another. The analyst becomes the semantic quality controller: the person who tests not whether the code compiles, but whether the meanings are right.

9. Deciding What to Build: The Structural Answer

Evans provides the missing layer in the deciding sequence. A specification is always a specification of something, within some context, for some purpose. Without an architecture of domains, specifications are locally precise but globally incoherent.

The sequence for deciding what to build, informed by Evans and adapted for engineers and business analysts working with AI, looks like this:

First, identify your bounded contexts. Not from an architecture diagram. From the language and the behaviour. Run a Big Picture EventStorming session. Place domain events on the wall. Watch where the naming diverges, where different groups tell different stories, where the hotspots cluster. Each linguistic boundary is a candidate context boundary and a candidate scope for a semantic standard.

Second, draw the context map. For each pair of communicating contexts, name the relationship pattern. Is it partnership, customer-supplier, conformist, anticorruption layer, published language, or separate ways? Be honest about the power dynamics and the semantic commitments. The context map tells you where you need explicit translation layers and where you need published contracts; in short, where your semantic governance effort must be concentrated.

Third, distil the Core Domain. Which context, or which part of a context, is the source of competitive advantage? A Wardley Map can inform this assessment by positioning each domain component on the evolution axis. This is where you concentrate your best people, your deepest knowledge crunching, and your most rigorous specification and semantic standard work.

Fourth, invest in knowledge crunching where it matters most. The Core Domain gets intensive, sustained three-way modelling work: domain experts, engineers, and AI in the same room, building the ubiquitous language, iterating the model through Process-Level EventStorming or Domain Storytelling sessions, producing specifications that reflect genuine domain understanding. Supporting subdomains get lighter treatment. Generic subdomains get off-the-shelf solutions.

Fifth, write specifications scoped to bounded contexts, expressed in the ubiquitous language. Each specification uses the language of its context. Each specification defines the contracts (OpenAPI for synchronous APIs, AsyncAPI for event-driven interfaces, JSON Schema for shared data structures) that govern the context’s boundaries. Each specification can be consumed by AI to generate implementations with confidence that the terms are unambiguous within scope. The semantic standard within the context is the ubiquitous language; the semantic standard at the boundary is the published language contract.

Sixth, deploy AI within bounded context boundaries, using MCP for internal integration and A2A for cross-context communication. Each context gets its own AI configuration: its own prompts grounded in the ubiquitous language, its own MCP server connecting it to context-specific tools and data sources, its own validation suites. Cross-context AI communication follows the integration patterns defined in the context map, with Agent Cards, OpenAPI specifications, and AsyncAPI schemas all expressing the same boundary contract in the same semantic terms.

This is the structural answer to “are we building the right things?” The answer is: we are building the things that our domain model tells us to build, within the boundaries our context map defines, with investment concentrated where our distillation exercise says it matters most. The applications are generated by AI. The semantic architecture that ensures they compose is governed by humans. The backlogs are no longer arbitrary lists of features. They are expressions of domain models, governed by specifications, scoped to contexts, and prioritised by strategic importance.

(An Organisational Prompt is something you can do now...)

Organisational Prompt

Evans’ most powerful diagnostic is linguistic friction: the moment when the same word means different things to different applications, and nobody has defined which meaning applies at the boundary.

Choose one integration point where two of your organisation’s applications exchange data. It might be a REST API call, an event on a message bus, a shared database, or a file transfer. Now examine the contract between them. Is it expressed as a formal specification (OpenAPI, AsyncAPI, JSON Schema)? Or is it implicit: understood by the developers who built both sides, but not written down in a form that an AI, or a new team member, could consume easily?

If there is a formal specification, check its language. Does the specification use the same terms that the domain experts in each team use when they talk about their work? Or has the specification drifted into technical jargon that neither domain expert would recognise? Pick three key field names from the contract. Ask a domain expert on each side of the integration what those fields mean. Compare the answers.

If the answers diverge, you have found a semantic boundary that has no governance. The two applications are exchanging data, but they are not exchanging meaning. When AI generates the next version of either application, it will embed its own interpretation of the ambiguous terms, and the integration will silently degrade.

Further Reading

Eric Evans: Domain-Driven Design: Tackling Complexity in the Heart of Software (2003). The foundational text. Read Part I (Putting the Domain Model to Work) and Part IV (Strategic Design) for the organisational and architectural insights. The tactical patterns in Parts II and III matter for implementation but are increasingly the AI’s concern rather than the engineer’s.

Alberto Brandolini: Introducing EventStorming (Leanpub, in progress). The definitive guide to EventStorming from its creator. Still being completed, but the available chapters cover big picture, process-level, and software design sessions with worked examples. Essential reading for anyone planning to run a domain discovery workshop.

Stefan Hofer and Henning Schwentner: Domain Storytelling: A Collaborative, Visual, and Agile Way to Build Domain-Driven Software (2021). The companion technique to EventStorming. Particularly effective with domain experts who prefer structured narrative to workshop chaos.

Susanne Kaiser: Architecture for Flow: Adaptive Systems with Domain-Driven Design, Wardley Mapping, and Team Topologies (2025). The most systematic account of how Wardley Mapping, DDD, and Team Topologies connect. Essential for understanding how strategic context, domain decomposition, and team design work together.

Vaughn Vernon: Implementing Domain-Driven Design (2013). The practical companion to Evans. Shows how to apply DDD patterns with modern tools and architectures.

Matthew Skelton and Manuel Pais: Team Topologies: Organizing Business and Technology Teams for Fast Flow (2019). Extends Evans’ bounded contexts into organisational design: how to structure teams around domains for fast, sustainable delivery.

Anthropic: Model Context Protocol (MCP) (2024). The open standard for connecting AI applications with external tools and data sources. The specification and SDK documentation are the primary reference for implementing MCP within bounded contexts.

Google: Agent2Agent Protocol (A2A) (2025). The open protocol for AI agent interoperability, now a Linux Foundation project. The specification, agent card schema, and sample implementations are available on the project site.

InfoQ: Eric Evans Encourages DDD Practitioners to Experiment with LLMs (March 2024). Report on Evans’ keynote at Explore DDD 2024, including his argument that a trained language model is a bounded context.

Disclaimer

Ackoff: How to Stop Solving the Wrong Problem

Justin Arbuckle — Thu, 23 Apr 2026 07:44:46 GMT

Somewhere in your organisation right now, a team is using AI to do the wrong thing faster.

They do not know it is the wrong thing. It was the right thing last year. It might even have been the right thing last quarter. But the process they are accelerating was designed for a world that no longer exists, and the AI is making it more efficient with an enthusiasm that would impress Frederick Taylor. Nobody has asked whether the process itself should exist, because everybody is too busy measuring how much quicker it runs.

Russell Ackoff, a systems thinker who spent five decades at the Wharton School dismantling the assumptions behind conventional management, had a phrase for this: doing the wrong thing righter. It is, he argued, the defining pathology of modern organisations. And AI has made it lethal, because doing the wrong thing righter at machine speed is how you automate your own obsolescence before anyone notices.

Ackoff’s work spans operations research, organisational theory, and management philosophy. His central contribution to the clarity problem is a set of distinctions so simple they are easy to dismiss and so important they explain why most transformation programmes produce motion without progress. If your AI strategy feels busy but directionless, Ackoff explains why; and he offers a method for escaping the trap that is, characteristically, the opposite of what most consultants would recommend.

1. Messes Are Not Problems

Ackoff coined the term “mess” to describe something that every transformation leader recognises but that no planning methodology adequately addresses: a system of interacting problems where the interactions matter more than the individual components.

A problem can be isolated, analysed, and solved. A mess cannot, because taking it apart destroys the thing you need to understand. The technology question (which models, which infrastructure) interacts with the skills question (who can use it), which interacts with the governance question (who decides), which interacts with the identity question (who am I now that AI does what I used to do), which interacts with the culture question (what gets rewarded), which interacts with the structural question (how teams are organised). Pull any one of these out and “solve” it independently, and you create new problems in the others.

This is not a metaphor. It is a structural claim. Most transformation programmes are a mess in Ackoff’s precise sense. Most organisations treat them as a collection of independent problems: run a skills programme, buy a platform, set up a governance committee, write a strategy document. Each is “solved” independently. The mess persists because nobody has addressed the interactions.

Eric Evans would recognise this at the domain level; his bounded contexts and context maps are tools for making the mess visible within software systems. Heifetz would recognise it at the leadership level; his adaptive challenges are messes that cannot be solved by existing expertise applied to isolated components. Ralph Stacey would add that the interactions are not just complicated but complex; they produce emergent patterns that no amount of upfront analysis can predict.

Ackoff’s contribution is the bluntest version of the diagnosis: reality presents messes, not problems, and the first step toward clarity is admitting that you are in one.

2. Four Ways to Treat a Problem (and Why Three of Them Fail)

Ackoff distinguished four responses to problems, and the distinction is the sharpest diagnostic I know for what is actually happening inside a transformation programme.

Absolution is ignoring the problem and hoping it resolves itself. This is more common than anyone admits. The AI pilot that nobody cancelled but nobody funded past quarter two? Absolution.
Resolution is reaching into the past for something that worked before. It uses experience and common sense to produce a result that is “good enough.” Most AI adoption programmes start here: what did we do for the last technology wave? Stand up a centre of excellence, run some training, publish a playbook. Resolution produces familiarity, not transformation.
Solution is using research and analysis to find the optimal answer. Hire the consultants, commission the benchmarking study, build the business case. Solution is more rigorous than resolution, but both share a fatal assumption: that the problem has been correctly identified. If it has not, the optimal solution to the wrong problem is still wrong.
Dissolution is redesigning the system so that the problem cannot recur. You do not fix the error within existing assumptions; you change the assumptions. Ackoff considered this the highest form of problem treatment, and the rarest. Argyris’s double-loop learning is dissolution applied to cognition; you do not correct the error within the governing variables, you change the governing variables themselves. Normann’s reframing is dissolution applied to mental models; you do not solve the problem within the existing map, you redraw the map so the problem disappears.

Consider a concrete example. Your development teams are producing AI-generated code whose output requires extensive review because the specifications are vague. Resolution: add more reviewers. Solution: deploy AI-assisted code review tools. Dissolution: redesign the development process so that specifications are precise enough that generated code does not need extensive review. The first two responses accept the existing system and try to manage its consequences. The third changes the system so the consequences do not arise.

The specification-driven development approach this series has been building toward is, in Ackoff’s terms, a dissolution strategy. It does not solve the problem of bad AI output. It (helps to) eliminate the conditions that produce it.

3. Doing the Wrong Thing Righter

This is Ackoff’s most famous line, and it deserves to be quoted in full:

“All of our social problems arise out of doing the wrong thing righter. The more efficient you are at doing the wrong thing, the wronger you become. It is much better to do the right thing wronger than the wrong thing righter. If you do the right thing wrong and correct it, you get better.”

The distinction is between effectiveness (doing the right thing) and efficiency (doing the thing right). Most organisations focus on efficiency. AI supercharges the trap.

If your customer service process is built around the wrong assumptions about what customers actually need, an AI chatbot will deliver the wrong service experience more efficiently. If your software development process produces specifications that miss the domain, AI code generation will produce the wrong software faster. If your decision-making process optimises against the wrong metrics, AI analytics will optimise the wrong outcomes with greater precision. In every case, the AI is performing brilliantly. The problem is upstream.

Beer’s POSIWID principle (”the purpose of a system is what it does”) is the diagnostic that reveals whether you are in this trap. If the actual output of your AI programme differs from its stated purpose, you are doing the wrong thing righter. Drucker would add the prior question: have you asked “what is our business?” recently, or are you assuming the answer has not changed? Christensen demonstrated with mechanical precision what happens to organisations that never ask: they rationally optimise their way into irrelevance.

4. Idealized Design: The Question Nobody Asks

Ackoff’s most practical tool for getting to clarity is idealised design, and it begins with a thought experiment that most leadership teams find simultaneously liberating and terrifying.

Assume that your organisation was destroyed last night. The environment, the customers, the market, the technology, the talent pool; all still exist. But the organisation itself is gone. Now design the organisation you would create to replace it, subject to only two constraints: it must be technologically feasible (no science fiction) and operationally viable (it could actually work). One additional requirement: the design must incorporate the ability to learn and adapt rapidly.

The concept emerged at a 1951 Bell Laboratories conference when a vice president opened the session by saying: “Gentlemen, the telephone system of the United States was destroyed last night.” That hypothetical destruction freed participants to design what they actually wanted rather than incrementally improving what existed. The distinction matters enormously. Incremental improvement starts from the current state and asks “what can we change?” Idealised design starts from a blank page and asks “what would we build?” The first is constrained by everything the organisation already is. The second is constrained only by physics and viability.

For AI transformation, the question becomes: if this organisation were rebuilt from scratch today, with full access to current AI capabilities, what would it look like? This is not the question most AI strategies ask. Most ask “where can we add AI to our existing processes?” That is preactive planning at best; predicting where AI will help and preparing for it. Ackoff would say it is the wrong question, because it preserves the existing structure and merely decorates it with new technology.

Mintzberg would rightly challenge idealised design as too deliberate; real strategy, he showed, emerges from accumulated action rather than implemented plans. The reconciliation is that idealised design is not a blueprint. It is a direction. The design is continuously revised through experience. What Ackoff provides is the destination that gives emergent action its coherence; without it, emergence is just drift.

5. From Data to Wisdom: Where AI Stops and Humans Must Start

Ackoff’s 1989 paper “From Data to Wisdom” formalised a hierarchy that has become so widely cited it has lost its bite. But for AI transformation, his original five-level version (not the four-level version most people know) is the clearest framework available for understanding where the human specification work sits.

Data is symbols. Information answers who, what, where, when, how many. Knowledge is know-how; it answers “how.” Understanding answers “why.” Wisdom is evaluated understanding; it answers “what is best to do.” Ackoff believed wisdom would probably never be generated by machines, and while that judgment was made decades before large language models, his structural distinction holds. AI processes data into information at superhuman speed. It applies knowledge patterns with increasing sophistication. But the understanding of why a particular specification matters, and the wisdom to judge what is the right thing to build, remain human capacities that no model release has displaced.

This reframes the specification gap. The human must supply the understanding and wisdom; the “why” and the “what is best”; in a form precise enough to constrain the machine’s knowledge-level processing. That is what a specification is: the bridge between human wisdom and machine capability. Without it, the machine operates at the knowledge level, generating technically competent output that may have nothing to do with what the organisation actually needs.

The organisations that get clarity on what to do with AI will be the ones that stop solving problems and start dissolving them. They will stop asking “where can we add AI?” and start asking “what would we build if we could start again?” They will stop optimising the wrong thing and start, at last, doing the right thing badly enough to learn.

An Organisational Prompt

(An Organisational Prompt is something you can do now, in your organisation, to put the ideas in this article to work.)

Ask the Destruction Question

In your next leadership meeting, try this: “If our department were dissolved overnight and we had to rebuild it from scratch with today’s technology, what would we build?” Give people ten minutes to write their answers privately before anyone speaks. Compare the answers with what you are actually doing. The gap is your mess.

Drucker: Perfectly Logical, Completely Wrong

Justin Arbuckle — Mon, 20 Apr 2026 07:00:41 GMT

The Learning phase article on Drucker asked why organisations cannot make knowledge workers productive. The answer was the specification problem: the knowledge worker must define the task before they can do it, and most organisations have never learned to help them. That was a learning problem. This is the deciding problem that sits beneath it: you cannot define the task if you do not know what the organisation is for. (See the about section for an overview of the phases.)

Drucker had a name for this. He called it the theory of the business.

1. The Theory of the Business

In a 1994 Harvard Business Review article that deserves to be read more often than it is, Drucker argued that every organisation operates on a set of assumptions. Assumptions about its environment: the society, the market, the customer, the technology. Assumptions about its specific mission: what it exists to do. Assumptions about its core competencies: what it must be excellent at to fulfil that mission. Together, these assumptions constitute the organisation’s theory of the business. When the theory is valid, the organisation makes good decisions almost automatically, because the assumptions guide action without requiring every decision to be escalated. When the theory is invalid, no amount of effort, talent, or technology will compensate.

The theory of the business is not the strategy. It is the set of assumptions that makes strategy possible. It is the water the fish cannot see.

And Drucker’s most unsettling claim is that every theory of the business eventually becomes invalid, because the environment changes, and the assumptions do not change with it.

The connection to the Deciding phase architecture is structural. Herbert Simon’s bounded rationality tells you that decision-makers cannot process everything; the theory of the business is what they use instead. It is the set of assumptions that pre-decides most questions before they are asked. Eric Evans’s ubiquitous language tells you that precision of description determines the quality of the specification; the theory of the business is the meta-language that determines what the organisation considers worth describing at all. Beer’s requisite variety tells you the architecture must match the environment’s complexity; an invalid theory of the business is a variety attenuator so lethal that it filters out the very signals that would reveal its own invalidity. We will cover all of these thinkers in other articles.

2. When the Theory Fails Silently

Drucker’s examples are instructive. IBM in the early 1990s had a valid theory of the business (the computer industry is driven by hardware) that became invalid (it is driven by solutions). GM had a valid theory (the American car market is segmented by income) that became invalid (it is segmented by lifestyle). In both cases, the organisation continued to make internally rational decisions that were externally catastrophic, because the decisions were rational within a theory that no longer matched reality. Look at IBM now…(again).

The AI transformation context reproduces this pattern exactly. Most large enterprises operate on a theory of the business that was formed before AI changed what is possible. The assumptions run deep: that competitive advantage comes from proprietary processes, that knowledge resides in individuals, that specification is a planning activity, that quality requires human review, that scale requires headcount. Each of these assumptions was valid. Several are becoming invalid. The organisation that continues to decide on the basis of yesterday’s theory will make decisions that are perfectly logical and completely wrong.

The difficulty is that invalid theories do not announce themselves. They fail silently.

The decisions feel right because they are consistent with the assumptions. The results feel wrong but the cause is invisible, because the cause is not a bad decision but a valid decision made within an obsolete frame. POSIWID applies: if your AI transformation is producing governance artefacts rather than changed practice, that is not a failure of execution. It is the theory of the business doing exactly what it was designed to do, which is to protect the assumptions on which the current organisation was built.

3. Systematic Abandonment as a Decision Discipline

In the Learning phase, systematic abandonment was framed as a refactoring discipline: stop doing what no longer works. In the Deciding phase, it becomes something sharper. It is the discipline of testing the theory of the business by asking which of its assumptions still hold.

Drucker’s question is brutal in its simplicity: “If we were not already doing this, would we start now?” Applied to assumptions rather than processes, it becomes: if we were not already assuming this about our market, our customers, our capabilities, would we adopt this assumption today? If the answer is no, the theory needs revision, and every decision that flows from the invalid assumption needs re-examination.

This is where Drucker connects to Rumelt’s diagnosis. Rumelt argues that the first element of good strategy is an honest account of the challenge. Drucker provides the prior step: before you can diagnose the challenge, you must test whether the theory through which you perceive the challenge is still valid. Most organisations skip this step because testing the theory means questioning the identity. And questioning the identity is the hardest thing any organisation can do, because the people whose careers were built on the old theory have every reason to defend it.

4. The Knowledge Worker Decides

Drucker’s most radical claim for the Deciding phase is that knowledge workers must manage themselves. They must decide what the task is, how to do it, and what quality means. This is not delegation. It is a structural requirement of knowledge work. The person who understands the domain is the person who must decide what needs doing, because nobody else has the knowledge to make that decision well.

In an AI-mediated world, this claim becomes urgent. When AI can generate implementations from specifications, the constraint shifts to the person who writes the specification. That person must decide what the system should do, what it should not do, how to validate the output, and when the specification itself needs to change. These are not technical decisions. They are knowledge decisions that require domain understanding, judgment, and the authority to act on that judgment. An organisation that centralises specification authority in a planning function has separated deciding from knowing, which is the Taylorist error Drucker spent his career opposing, reproduced in a new medium.

Ackoff’s distinction between dissolving and solving is relevant here. The organisation that tries to solve the specification problem by creating a central specification team is solving within the existing structure. The organisation that dissolves the problem recognises that specification authority must live with domain knowledge, which means redesigning decision rights so that the people closest to the domain are the people who decide what gets specified.

5. Three Tests for the Theory

Drucker specified three requirements for a valid theory of the business, each of which doubles as a diagnostic for decision quality.

First, the assumptions about environment, mission, and competencies must fit reality. This sounds obvious, but most organisations have never written their assumptions down, let alone tested them. The assumptions are embedded in budget allocations, reporting structures, incentive schemes, and hiring criteria. They are the water. Making them visible is itself an act of deciding, because it forces choices about which assumptions to keep and which to abandon.
Second, the assumptions must fit each other. An organisation that assumes its competitive advantage is speed but measures success by compliance throughput has contradictory assumptions. An organisation that assumes AI will transform its business but funds AI from the cost-reduction budget has contradictory assumptions. The contradictions are not visible to the people inside them, because each assumption was adopted at a different time, by different leaders, for different reasons. The theory of the business has never been read as a single document, because it has never been written as one.
Third, the theory must be known and understood throughout the organisation. This is where Drucker meets Evans most directly. Evans’s ubiquitous language is the theory of the business made operational in a bounded context. When the payments team and the fraud team use the word “customer” to mean different things, the theory of the business has not been translated into the language the teams actually use. The assumptions exist in the strategy deck. They do not exist in the code, the specification, or the daily conversation of the people making decisions.

(An Organisational Prompt is something you can do now....)

Write down your theory of the business.

Not the strategy. Not the vision statement. The assumptions. Sit with your leadership team and answer three questions: what do we assume about our environment that justifies our current direction? What do we assume about our mission that determines what we say yes and no to? And what do we assume about our core competencies that determines where we invest? Write the answers on one page. Then test each assumption by asking: when was this last validated by evidence from outside the organisation? If the answer is “never” or “when we wrote the strategy,” you have found the source of your decision problems. The theory of the business is the invisible architecture of every decision your organisation makes. Making it visible is the first act of deciding well.

Further Reading

Peter Drucker: The Theory of the Business (Harvard Business Review, 1994) - The single most important Drucker article for the Deciding phase. Short, devastating, and freely available. Read it for the three requirements and the case studies of theories that expired.

Peter Drucker: Management Challenges for the 21st Century - Where Drucker addresses the challenge of managing yourself, managing knowledge worker productivity, and the change leader. The chapter on the theory of the business extends the HBR article into a full diagnostic framework.

Peter Drucker: The Practice of Management - Where “the purpose of a business is to create a customer” first appears. Still the clearest statement of why purpose must be externally grounded.

Rumelt & Martin: Goals Are Not a Strategy

Justin Arbuckle — Thu, 16 Apr 2026 07:43:34 GMT

Most transformation programmes have a strategy document. It contains an aspiration (”become an AI-first organisation”), a set of goals (adoption targets, cost savings, headcount adjustments), and a list of initiatives (pilots, platforms, training programmes).

It is likely wrong; not in its ambition, but in its genre. It has goals where it needs a diagnosis. It has initiatives where it needs coherent action. It has aspiration where it needs choices. Richard Rumelt would call it bad strategy. Roger Martin would say the organisation is playing to play, not playing to win. They might both be right, and the distinction matters because the difference between bad strategy and good strategy is not quality of thinking. It is willingness to decide.

1. The Kernel: Diagnosis, Guiding Policy, Coherent Action

Rumelt’s contribution is to strip strategy to its essential structure.

A good strategy is a coherent response to a high-stakes challenge.

It consists of three inseparable elements: a diagnosis that defines what is going on, a guiding policy that establishes the overall approach, and coherent actions that carry out the policy. Remove any one and the strategy collapses.

The diagnosis is the foundation, and it is the element most consistently absent. Rumelt is blunt: “a great deal of strategy work is trying to figure out what is going on.” The diagnosis is not a statement of goals or desires. It is an honest account of the challenge, including the uncomfortable parts. The medical analogy is deliberate: a doctor who prescribes treatment without diagnosis is malpractising. An executive who, for instance, launches more AI initiatives without diagnosing what prevents the organisation from using AI effectively is doing the same.

The guiding policy channels and constrains action without fully specifying it. It creates advantage by concentrating effort on a pivotal aspect of the situation. The coherent actions are the punch: coordinated steps designed to carry out the policy. They must be mutually reinforcing, not a disconnected wish list. Rumelt observes that when executives complain about “execution problems,” it is usually because they confused setting goals with setting strategy. Bringing strategy down to action level flushes out the conflicts that aspirational language conceals.

For AI transformation, the kernel exposes the standard failure pattern. The fluff is “leveraging AI to drive innovation and competitive advantage.” The failure to face the challenge is the omission of what actually prevents adoption: specification capability, domain knowledge fragmentation, governance designed for a different era, cultural resistance rooted in legitimate fear. The goals masquerading as strategy are “deploy AI across 50% of business processes by 2026.” And the absence of coherent action is a portfolio of disconnected pilots that nobody has diagnosed as a system.

2. The Crux: Finding the Decisive Point

In his later work, Rumelt introduced the crux: the most critical aspect of a challenge that is also solvable. In rock climbing, the crux is the hardest section of the route; if you cannot get past it, you should not attempt that climb. In strategy, the crux forces the same discipline. Focus on the decisive point, not on everything at once.

The crux of most AI transformations is not the technology. It is the organisation’s inability to articulate what it wants with enough precision for AI to act on it. This is a specification problem, not a tooling problem. Focusing resources on AI platforms when the crux is specification capability is what Rumelt calls the chain-link error: you improve one link while the weakest link remains untouched, and the system’s performance remains bounded by what it cannot do, not by what it can.

The connection to Herbet Simon is direct. Simon’s proximate objectives, goals close enough to be feasible, are the strategic application of his idea of bounded rationality. Rumelt’s proximate objectives serve the same function: instead of “become AI-first,” set an objective achievable this quarter. “Three teams will have written specifications that generate working AI outputs without manual rework.” Each proximate objective creates momentum and learning that informs the next. This is the antithesis of the big-bang transformation programme, and it is the only approach that respects what Simon showed about how decisions actually get made in real organisations.

3. The Choice Cascade: Strategy as Five Integrated Decisions

Martin’s contribution is complementary. Where Rumelt begins with the challenge and works toward action, Martin begins with aspiration and works toward the systems that make it real. His Strategy Choice Cascade, developed with A.G. Lafley at Procter and Gamble, frames strategy as five integrated choices: a winning aspiration, where to play, how to win, must-have capabilities, and enabling management systems.

The heart is where to play and how to win, and they must be developed together. A where-to-play without a how-to-win is an aspiration. A how-to-win without a where-to-play is a capability in search of a market. Most AI strategies define where to play (”we will use AI in customer service, underwriting, and operations”) without ever specifying how to win (”our competitive advantage will be superior specification quality from deep domain expertise”). The where-to-play sounds strategic. Without a matched how-to-win, it is a shopping list.

Martin’s sharpest distinction is between playing to win and playing to play.

“When companies set out to participate in a market instead of winning, they will inevitably fail to make the tough choices that would make winning even a remote possibility.”

Playing to play means deploying AI broadly and hoping something sticks. Playing to win means choosing specific domains where your organisation’s domain knowledge gives it a specification advantage that competitors cannot match, and concentrating resources there. The first feels responsible. The second feels risky. Only the second is strategy.

The fourth and fifth boxes, capabilities and management systems, are where most organisations lose the plot. Without them, the strategy cannot be executed because it has not been translated into what the organisation must be able to do. If the how-to-win is “superior specifications from domain experts,” then the must-have capability is specification skill, which means (LEARNING CONDITIONS) training, practice, feedback loops, and a culture that values specification quality over AI output volume. The enabling management systems are the measures that tell you whether it is working.

4. Integrative Thinking: Refusing the False Choice

Martin’s second major contribution is integrative thinking: the discipline of refusing to accept unpleasant trade-offs as given. Most people, when faced with opposing options, simply choose one at the expense of the other. Martin’s research found that the most effective leaders use the tension between opposing models as raw material for creating a superior third option. Follett is a good mirror here.

The AI transformation is full of apparent either/or choices. Maintain governance or empower experimentation. Invest in tooling or invest in people. Centralise AI strategy or let teams diverge. Martin would say each dichotomy is false. The integrative response to “governance or experimentation” is to design governance that enables experimentation: guardrails that constrain the playing field without constraining the play within it. This is Bungay’s (from the military leaders article) directed opportunism applied to AI, and Ackoff’s dissolving applied to the governance problem.

The connection to Beer is structural. Beer’s 3-4 homeostat holds the tension between inside-and-now (System 3, optimisation) and outside-and-then (System 4, intelligence). Martin’s integrative thinking is the cognitive discipline that Beer’s architecture makes structurally possible. The viable system does not choose between exploit and explore. It maintains both, held in tension by an identity (System 5) that refuses the either/or.

5. The Knowledge Funnel: Mystery, Heuristic, Algorithm

Martin’s knowledge funnel describes how value is created through the progressive refinement of understanding. A mystery (something we cannot explain) is narrowed to a heuristic (a rule of thumb that guides action) and then codified into an algorithm (a fixed formula that produces predictable outcomes).

Most organisations are in the mystery phase of AI adoption: they do not yet understand what AI can reliably do in their specific context. The temptation is to skip to algorithm: buy a platform, deploy standard use cases, measure adoption percentages. This skipping produces what Martin calls the reliability bias: organisations adopt AI in the most predictable, measurable ways (chatbots, summarisation, code completion) while ignoring the harder mysteries (domain-specific reasoning, specification-driven generation, human-AI collaboration models that do not yet exist).

The heuristic phase is where the real value lies. Teams experimenting with AI in their specific domain, developing rules of thumb about what works, building tacit knowledge about specification quality. This is Mintzberg’s potter at the wheel, translated into the AI context. Organisations that skip the heuristic phase and jump to algorithmic deployment will get commodity AI applications that provide no competitive advantage. The heuristic phase is uncomfortable because it cannot be measured on a dashboard. It looks like mess. It is the mess from which strategy emerges.

6. Strategy as Hypothesis

Rumelt and Martin converge on a single insight that reframes how leaders should think about deciding. A strategy is not a plan. It is a hypothesis. Compare this to Stacey and Popper.

Rumelt insists that a good strategy is a testable claim about how to overcome a challenge. Martin argues that the five-box cascade is a set of bets: “we believe that if we play here and win this way, we will achieve our aspiration.” Both insist that a strategy that cannot be wrong is not a strategy. It is a truism.

This reframes failure. A strategy that does not produce the expected result is not a disgrace. It is a hypothesis disconfirmed, which is information. The willingness to make a bet that might be wrong is the price of strategic clarity. The unwillingness to bet is, in both Rumelt’s and Martin’s terms, the hallmark of bad strategy: the organisation has avoided choosing, and has therefore avoided deciding.

Popper is the philosophical ancestor here. A strategy that cannot be falsified is the organisational equivalent of a theory that cannot be tested. It is safe, it is comfortable, and it tells you nothing.

(An Organisational Prompt is something you can do now....)

Diagnose before you prescribe.

Take your current AI strategy and remove the aspirations, the goals, and the initiative list. What remains is the diagnosis: the honest account of what is preventing your organisation from using AI effectively. If nothing remains, you do not have a strategy. You have a wish list. Write the diagnosis. One page. What is actually going on? What is the crux, the single hardest obstacle that is also solvable? If you cannot name it, you are not ready to decide. If you can name it but the document does not mention it, the strategy has been written to avoid the truth rather than to confront it. Rumelt’s first hallmark of bad strategy is the failure to face the challenge. Face it. Everything else follows from that.

Further Reading

Richard Rumelt: Good Strategy/Bad Strategy - The essential starting point. The kernel framework, the four hallmarks of bad strategy, and the insistence that strategy begins with diagnosis. One of the most useful management books written this century.

Richard Rumelt: The Crux - Extends the kernel with the crux concept and the Strategy Foundry process for group strategy creation.

Roger Martin and A.G. Lafley: Playing to Win - The Strategy Choice Cascade. Practical, case-rich, and immediately applicable. The distinction between playing to win and playing to play is worth the book alone.

Roger Martin: The Design of Business - The knowledge funnel, the reliability-validity tension, and abductive reasoning. Essential for understanding why organisations systematically under-explore.

Roger Martin: The Opposable Mind - Integrative thinking. Why the best leaders refuse the either/or and how they generate superior options from opposing models.

Events Change Organisations, Not People. Learning Changes People.

Justin Arbuckle — Wed, 15 Apr 2026 07:03:04 GMT

A leader I know at another company described the moment his organisation realised that AI would change everything. Not the strategy offsite. Not the board presentation. The moment a domain expert sat with a language model and, in forty minutes, produced a working specification that would have taken a team two weeks. The room went quiet. Then someone said: “If that works, what are we all doing?”

That was an event. Not in the calendar sense; in the transformational sense. Something happened that could not be unseen. The question is what happened next. In most organisations, what happens next is: nothing structural. The event is discussed, admired, presented upward, and gradually absorbed into existing patterns.

Six months later, the organisation is doing exactly what it did before, with a new vocabulary. The event produced language but not structure. The language produced action but not agency. The cycle stalled, and the organisation lost the capacity to respond to the next disruption because it never finished responding to the first one.

1. The ELSA Cycle: How Change Actually Moves Through Organisations

The ELSA model describes the mechanism by which organisations process change. It has four stages, and each transition is where transformation either advances or dies.

Event is the disruption: the demonstration that cannot be unseen, the competitive move that invalidates assumptions, the technology shift that renders a capability obsolete. Events can be external (a market shift, a competitor’s move) or internal (a gesture; an experiment, a provocation, a deliberate attempt to surface what has been hidden). Events are charismatic in Weber’s sense: they derive their power from direct experience and emotional impact, not from rules or tradition. They disrupt existing frameworks. They create a burst of transformative energy.

Language is what happens when the organisation begins to name what the event revealed. New categories emerge: “prompt engineering,” “agentic workflows,” “specification-driven development.” The language creates shared reference points. It makes the event discussable. It begins the process of routinisation: channelling disruptive energy into stable concepts that people can work with.

Structure is what happens when the new language becomes institutional. Governance frameworks are written. Teams are reorganised. Processes are redesigned. Incentives are realigned. The new patterns are formalised into arrangements that can operate without the charismatic catalyst that started the cycle.

Agency is what happens when the new patterns become self-sustaining. People act from the new framework without being told to. The new way of working reproduces itself through practice, not instruction. The organisation has not merely adopted a change; it has become a different kind of organisation, one whose dispositions generate different behaviour.

The cycle is not a one-time transformation. It is the mechanism by which organisations navigate continuous change. But only if each transition succeeds. And this is where the nine probes become essential.

2. Event to Language: Can the Organisation Name What Just Happened?

The transition from event to language is where most transformation programmes die their first death. The event happens. It is powerful, disorienting, generative. And then the organisation must find words for what it experienced. This is harder than it sounds, because honest language requires conditions that most organisations do not have.

Three probes govern this transition.

Truth-telling. Can people say what they actually saw? The event may have revealed that existing competencies are obsolete, that the current strategy is based on assumptions that no longer hold, that the organisation’s competitive position is weaker than anyone has admitted. If people cannot say these things; if the gap between formal meetings and corridor conversations is wide; the language that emerges will be diplomatic rather than diagnostic. It will name what is comfortable rather than what is true. And language that does not capture reality cannot produce structures that address it.

Proximity. Are the people creating the language close enough to the event to describe it accurately? If the event happened in a team room but the language is being crafted in a boardroom, every layer of hierarchy between the experience and the description is a reduction in fidelity. The leader who saw the domain expert produce a specification in forty minutes has proximity. The steering committee that heard about it third-hand does not. The language they create will describe what they imagined, not what happened. Ohno would recognise the mechanism instantly: go to the gemba. Do not decide from reports.

Loss. Can people tolerate what the event implies they must give up? Every genuine event carries a loss: a competency devalued, a role diminished, an identity threatened. If people cannot tolerate the loss, they will not name the event honestly. They will domesticate it: “AI is a tool that will augment our existing processes” rather than “AI means that the way we have always worked is over.” The domesticated language feels safer. It is also useless, because it cannot produce structures that address the actual disruption.

When all three probes pass, the organisation produces language that is truthful, precise, and unflinching. When any probe fails, the language drifts toward comfort, and the cycle stalls at its first transition.

3. Language to Structure: Can the Organisation Formalise What It Has Named?

The transition from language to structure is where transformation programmes die their second death. The organisation has found words for what happened. The words are circulating in presentations, strategy documents, town halls. But words are not structure. The question is whether the new language will reshape the institution or merely decorate it.

Three probes govern this transition.

Rewards vs words. Is the organisation changing what it rewards, or just what it says? This is the most diagnostic single question in transformation. If the organisation talks about “specification-driven development” but still promotes people who ship code fast, the language is disconnected from the incentive structure. People will learn the new vocabulary and continue the old behaviour, because the old behaviour is what gets rewarded. New language without new incentives is experienced as hypocrisy, and hypocrisy kills the energy that the event generated.

Structures serve or obstruct. Do the new structures serve the work, or does the work serve the structures? When the organisation creates an AI Centre of Excellence, an AI governance framework, an AI risk assessment process, the question is whether these structures enable people to work differently or whether they exist to manage the anxiety of leaders who need to feel that the disruption is under control. When governance exists to protect governance, the institution has inverted. The structure has absorbed the language without changing the practice. This is Weber’s routinisation at its most insidious: the charismatic energy of the event is channelled into bureaucratic arrangements that look like transformation and function as restoration.

Can the organisation stop what no longer works? New structure requires dismantling old structure. If the organisation cannot abandon processes, roles, and governance arrangements whose original purpose has expired, it will layer new structures on top of old ones. The result is not transformation but accumulation: more process, more governance, more overhead, less capacity to act. The inability to stop is often a greater barrier than the inability to start. Every structure that persists past its purpose is a tax on the organisation’s ability to respond to the next event.

When all three probes pass, the organisation produces structures that embody the new language in institutional form: incentives, processes, governance, and team designs that make the new way of working the path of least resistance. When any probe fails, the structure becomes a monument to a change that never happened.

4. Structure to Agency: Can the New Patterns Become Self-Sustaining?

The transition from structure to agency is the most difficult and the least visible. Structure is necessary but not sufficient. An organisation can have all the right governance, all the right team designs, all the right incentive structures, and still fail to develop agency, because agency is not a structural property. It is a behavioural one. Agency means that people act from the new framework without being told to, because they have internalised it as practice rather than received it as instruction.

Three probes govern this transition.

Practice vs instruction. Is the new capability being practised or merely taught? Training changes vocabulary. Practice changes capability. If the organisation’s approach to the new structure is to run workshops, certification programmes, and e-learning modules, it is investing in instruction. Instruction produces people who can describe the new way of working. Practice produces people who can do it. The difference is the difference between reading about swimming and swimming. Bourdieu would recognise the mechanism: the habitus; the embodied dispositions that generate practice below conscious awareness; is changed by practice, not by instruction. You cannot lecture someone into a new habitus.

Belief. Do people believe that the new structure will endure? Learned helplessness from previous failed changes drains the conviction that this time will be different. If the organisation has a history of announcing transformations that quietly expire after eighteen months, people will wait out the current one. They will comply with the new structures while preserving the old practices, because experience has taught them that the old practices will outlast the new structures. Belief is not optimism. It is the assessment, based on observable evidence, that the organisation is serious. The evidence is in the probes that preceded this one: did the language tell the truth? Did the rewards change? Did old structures get dismantled? If yes, belief follows. If no, no amount of leadership communication will produce it.

Can the organisation integrate conflict? The transition from structure to agency always generates friction. People who thrived under the old arrangements resist the new ones. Teams that built their identity around capabilities that the new structure devalues experience the transition as an attack. If the organisation suppresses this conflict; through dominance, avoidance, or the pretence that everyone is aligned; the new patterns cannot stabilise. They exist on the surface while the real dynamics continue underground. Follett’s integration; finding solutions that neither party had imagined, rather than compromising or dominating; is the only mechanism that converts structural change into genuine agency. The conflict is not an obstacle to the transition. It is the transition. How the organisation handles it determines whether the new patterns take root or wither.

When all three probes pass, agency emerges: the new way of working reproduces itself through practice, and the organisation has genuinely changed. When any probe fails, the structure remains a shell, and the organisation reverts to its prior state the moment pressure is applied.

5. Where the Probes Cluster: The Three Levers

The nine probes are not distributed randomly across the ELSA transitions. They cluster by the three levers that govern the entire series: Identity, Information, and Interaction.

The Identity probes (loss, practice vs instruction, belief) appear at the transitions where the person must change: at the moment the event demands giving something up, at the moment the new structure demands new practice, and at the moment where conviction determines whether the change holds. Identity is the lever that determines whether the individual can move. Without it, the event is resisted, the language is domesticated, and the structure is a performance.

The Information probes (truth-telling, proximity, rewards vs words) appear at the transitions where the organisation must describe reality: at the moment the event must be named, at the moment the language must be backed by incentives. Information is the lever that determines whether the organisation can see. Without it, the language is fiction, the structure is theatre, and the cycle operates on fantasy rather than evidence.

The Interaction probes (structures serve or obstruct, can the org stop what no longer works, can the org integrate conflict) appear at the transitions where the parts of the organisation must relate differently: at the moment new structures must replace old ones, at the moment the friction between old and new must be resolved. Interaction is the lever that determines whether the system can reorganise. Without it, new structures accumulate on top of old ones, conflict is suppressed, and the organisation calcifies.

The directional logic holds: Identity constrains Information constrains Interaction. If people cannot tolerate loss, they cannot tell the truth. If they cannot tell the truth, the structures they build will be based on fiction. If the structures are based on fiction, the interactions they produce will reproduce the old patterns. But Interaction is where intervention occurs: change the structures, change the incentives, change the way conflict is handled, and Identity and Information shift in response.

6. The Virtuous Cycle

An organisation that has successfully navigated one complete Learning ELSA cycle has not merely survived a disruption. It has expanded its capacity to perceive and respond to the next one.

This is Bateson’s Learning II made operational. The organisation has not just learned a new response (Learning I). It has learned how to learn from disruption (Learning II). The probes that enabled the first cycle become the sensing apparatus for the next one. Truth-telling, practised during the first transition, becomes the norm that allows the organisation to see the next event clearly. Proximity, maintained during the creation of language, keeps the organisation close enough to reality to notice when reality changes. The capacity to integrate conflict, developed during the transition to agency, means the next event is experienced as generative rather than threatening.

Each successful cycle expands what Levin calls the cognitive light cone: the spatiotemporal scale of the goals the organisation can pursue and the disruptions it can perceive. Each failed cycle contracts it. An organisation that stalls at the language stage; producing new vocabulary without new structure; has a smaller light cone after the event than before it, because it has consumed energy and credibility without producing capability.

This is why transformation is not a project with a start and end date. It is a cycle that the organisation must be able to execute continuously, at varying speeds, across multiple simultaneous disruptions. The nine probes are not a checklist to complete once. They are the conditions that must be maintained for the cycle to keep turning.

7. The Rotation: Why the Phases Start in Different Places

Everything in this article so far describes the Learning phase. Learning runs E → L → S → A. It starts with Event because learning is triggered by disruption; something must happen before you can learn from it. It ends with Agency because learning succeeds when new dispositions are self-sustaining.

But the series has four phases: Learning, Deciding, Building, Leading. And each phase enters the ELSA cycle at a different position. This is not a design choice. It is a structural necessity, because each phase produces a different kind of output, and the kind of thing one phase produces is not the same kind of thing the next phase requires as input. The gap between output and input is what the phase transition must bridge.

Learning ends with Agency: people can now tell the truth, practise new capabilities, tolerate loss, integrate conflict. Agency is a capacity, not a description. You cannot hand a capacity directly to a process that needs description. Agency must be applied to produce description. The first thing an organisation does with its Learning Agency is describe its domain honestly; something it could not do before the Learning conditions were in place. So Deciding begins at Language. The Deciding ELSA cycle runs L → S → A → E.

Deciding ends with Event: a specific, bounded, buildable thing that the organisation has designed its way toward. An Event is a specification, not a system. You cannot hand a specification directly to a process that needs construction. A specification must be built to become structure. So Building begins at Structure. The Building ELSA cycle runs S → A → E → L.

Building ends with Language: the organisation discovers what to say about what it built; what worked, what failed, what the operation revealed that the specification did not anticipate. Language is knowledge, not the capacity to act on it. You cannot hand knowledge directly to a process that needs action. Knowledge must be internalised to become agency. So Leading begins at Agency. The Leading ELSA cycle runs A → E → L → S.

Leading ends with Structure: the institutional redesign that enables the organisation to perceive and respond to the next disruption. Structure is an arrangement, not an experience. You cannot hand an arrangement directly to a process that needs disruption. A structure must be encountered; tested, stressed, surprised; to produce an event. So Learning begins at Event. And the cycle completes.

The four phases of the series are one rotation of ELSA at the macro level. Each phase owns one starting position. Each handoff bridges the gap between what one phase produces and what the next phase needs. The gap is never zero, because a capacity is not a description, a specification is not a system, knowledge is not agency, and an arrangement is not a disruption. The rotation exists because transformation is never a direct handoff. It is always a translation.

This is the architecture of the series. Learning (E → L → S → A) creates the conditions for honest description. Deciding (L → S → A → E) designs toward a buildable event. Building (S → A → E → L) constructs, operates, and discovers. Leading (A → E → L → S) acts, perceives, names, and reorganises. And the reorganisation produces the structure that the next disruption will test.

8. From Learning Agency to Deciding Language

The handoff from Learning to Deciding is the first phase transition in the series, and it illustrates how all the transitions work.

Learning Agency means the organisation can now tell the truth about its situation. Its people are close enough to reality to see what is actually happening. It has dismantled structures that no longer serve the work. It can integrate conflict. It has practised new capabilities, not merely been instructed in them. In short: it has the conditions for honest description. And honest description, in an organisation that has genuinely learned, is itself a challenge. The organisation now sees, with unflinching precision, the domain in which it must decide. That seeing is not yet a decision. It is the Language that opens the Deciding cycle.

The Deciding cycle has its own probes, mapped to its own ELSA transitions. Where the Learning probes ask “can this organisation learn?”, the Deciding probes ask “can this organisation treat decisions as design challenges?” Can it describe its domain in language practitioners actually use? Can it distinguish what it knows from what it assumes? Can it name what it will not do? Can it hold competing designs without premature closure? Does the decision process produce what it intends? These probes govern the Deciding transitions in the same way that the Learning probes govern the Learning transitions.

And the Deciding cycle ends not with Agency but with Event: a specific, bounded, buildable thing. Not a strategic priority. Not a programme of work. Something precise enough that the Building phase can construct it. The output of Deciding is the input of Building, translated through the same rotation: a specification (Event) must be constructed (Structure) before it can become operational.

This is why the Learning phase must come first, and why organisations that skip it pay the price at every subsequent phase. An organisation that attempts to decide without having learned; without truth-telling, without proximity, without the capacity to integrate conflict; cannot produce honest Language. Without honest Language, it cannot examine its own Structure. Without structural examination, it cannot develop the Agency to commit. And without genuine commitment, it cannot produce the Event that Building requires. The decisions will look like decisions. They will have the form of design. But they will be pattern-matching against a distribution the organisation has never honestly examined. They will be, in the language of the companion essay, organisational hallucinations: confident, fluent, plausible, and wrong.

The organisation that completes the Learning cycle before entering the Deciding cycle has earned the right to its own clarity. Its decisions will be constrained; Simon guarantees that. Its descriptions of reality will be imperfect; Ohno guarantees that, which is why he insisted on going back to the gemba again and again. Its structures will eventually need redesigning; Beer guarantees that. But the constraints will be real, not imagined. The descriptions will be shared, precise, and grounded in what people actually see. The structures will have been built to serve the work, not to reproduce the past.

The cycle turns. Learning produces the agency to describe. Describing produces the architecture to commit. Committing produces the event to build. Building produces the knowledge to lead. Leading produces the structure that the next disruption will test.

The organisation that can navigate this continuously is the one that survives what it cannot predict.

Further Reading

Gregory Bateson: Steps to an Ecology of Mind (1972). The levels of learning and the insistence that mind is a property of the system, not the individual. Learning II; learning to learn from disruption; is the capacity the ELSA cycle builds when it completes.

Pierre Bourdieu: Outline of a Theory of Practice (1977). The habitus and why practice changes capability where instruction cannot. The transition from structure to agency is a transformation of habitus.

Max Weber: Economy and Society (1922, translated edition). The routinisation of charisma and the iron cage of bureaucracy. Weber explains why the Language to Structure transition so often restores the status quo under a new label.

Mary Parker Follett: Creative Experience (1924). Integration as the mechanism for converting conflict into capability. The Structure to Agency transition depends on Follett’s integration: finding solutions neither party imagined.

Taiichi Ohno: Toyota Production System: Beyond Large-Scale Production (1988). The gemba principle, standard work, and jidoka. Ohno’s insistence on seeing reality as it is, not as it is reported, grounds the Information probes across both the Learning and Deciding ELSA cycles.

Michael Levin: “Technological Approach to Mind Everywhere (TAME),” Frontiers in Systems Neuroscience 16, 768201 (2022). The cognitive light cone concept. Each completed ELSA cycle expands the light cone; each stalled cycle contracts it.

From Learning to Deciding. A Route Map of The Story So Far And its Application to AI Adoption

Justin Arbuckle — Mon, 13 Apr 2026 07:02:30 GMT

Every AI transformation programme has a purpose statement. It is on the second slide of the strategy deck. It says something about “leveraging artificial intelligence to drive innovation, efficiency, and competitive advantage.” Everyone has seen it. Not everyone can connect it to what they are supposed to do differently on Monday morning.

This is the gap between clarity and action. Not the absence of purpose, but the presence of a stated purpose that floats above the reality of work: disconnected from how people actually operate, what they believe they are trying to achieve, and what the organisation rewards them for doing. This article is a bridge. The series has spent its first phase asking why organisations cannot learn. Now it turns to a harder question: how does an organisation design its way to action? And the answer, it turns out, lies in the same model that governed the Learning phase; the ELSA cycle; but entered from a different position.

1. What the Learning Phase Found

The Learning phase profiled thinkers who diagnosed the barriers to organisational learning. The synthesis established a governing hypothesis: learning is a condition, not a process. It emerges when three conditions are met, governed by three thinkers whose work anchors the architecture of the series.

The Identity condition (governed by Bourdieu): identity must be safe enough to change. Can people tolerate losing what they have? Is learning happening through practice, or through instruction? Do people believe that effort produces results? Habitus; the embodied dispositions that generate practice below conscious awareness; is reshaped through participation, not through training. Learned helplessness is itself habitus: a sedimented disposition that effort does not produce results.

The Information condition (governed by Bateson): information must be clean enough to act on. Can people tell the truth about what is happening? The double bind; contradictory messages at different logical levels with no permission to name the contradiction; is the mechanism that kills information flow. Are decision-makers close to the work? Information degrades with distance. Is the organisation changing what it rewards, or just what it says? New language without new incentives is a structural double bind.

The Interaction condition (governed by Illich): the institutional form must be convivial enough to permit learning. Does the institution serve the people, or do the people serve the institution? Can the institution stop doing what no longer works? Can the institution integrate conflict, or must it suppress it?

These nine probes tell a leader where the learning condition is absent. They do not tell the leader what to do with the learning once the condition is present. And that is where the series turns.

2. How the Learning ELSA Cycle Runs

The ELSA model describes how change moves through organisations. In the Learning phase, the cycle runs E → L → S → A.

Event is the disruption: the demonstration that cannot be unseen, the competitive move that invalidates assumptions, the technology shift that renders a capability obsolete. The event can be external (a market disruption) or internal (a gesture; an experiment, a provocation, a deliberate attempt to surface what has been hidden). Events are charismatic in Weber’s sense: they derive their power from direct experience, not from rules or tradition.

Language is what happens when the organisation begins to name what the event revealed. New categories emerge. The language creates shared reference points and makes the disruption discussable. It begins the process of routinisation: channelling disruptive energy into stable concepts that people can work with.

Agency is what happens when the new patterns become self-sustaining. People act from the new framework without being told to. The new way of working reproduces itself through practice, not instruction. Bourdieu would recognise the mechanism: the habitus has been reshaped. The organisation has not merely adopted a change; it has become a different kind of organisation, one whose dispositions generate different behaviour.

The nine Learning probes govern the transitions between these stages. Three probes at each transition, drawn from whichever lever the transition functionally requires. When a transition’s probes fail, the cycle stalls: the event produces diplomatic language rather than honest language; the language decorates existing structures rather than reshaping them; the structures are complied with rather than practised into new dispositions.

When the cycle completes, the organisation has Agency: the capacity to tell the truth, to practise rather than merely learn, to tolerate loss, to integrate conflict, to stop what no longer works. These are not strategic capabilities. They are conditions. And conditions, once present, make something else possible.

3. Every Absent Condition Is a Barrier to Deciding

Read together, the Learning phase thinkers reveal something none of them states alone: every absent condition for learning is simultaneously a barrier to deciding.

Where the Identity condition is absent, the organisation cannot decide because the habitus of its members generates practice that reproduces the old commitments automatically. The senior developer whose reflexes are calibrated to a world where coding is the primary work cannot adopt a new direction through intellectual assent. Their embodied dispositions will produce code-centric behaviour regardless of what the strategy says. Deciding requires identity transition, and identity transitions take far longer than any programme timeline allows.

Where the Information condition is absent, the organisation cannot decide because its double binds prevent reality from becoming visible. Beer’s law captures this: the purpose of a system is what it does, not what it says it does. The gap between the strategy slide and Monday morning is not a communication failure. It is a structural double bind: “our purpose is innovation” delivered through structures that reward predictability. The organisation cannot describe its own domain honestly, and without honest description, every decision is made against a fiction.

Where the Interaction condition is absent, the organisation cannot decide because the institutional form has replaced purpose with the consumption of its own services. The programme’s metrics measure its own activity; training delivered, milestones reached; rather than the capability it was designed to develop. The organisation cannot stop what no longer works, and therefore cannot make room for what must replace it.

The directional logic connects the three. Identity constrains Information: what people can perceive determines what information they can process. Information constrains Interaction: what information is available determines how parts can relate. But Interaction is where change actually occurs: shifts in interaction patterns change what information flows, which changes what people perceive, which changes identity. The causation runs one way for understanding; it runs the other way for intervention.

This is why Learning must come first. Without these conditions, the Deciding phase operates on corrupted input. The language will be diplomatic rather than precise. The models will be unchallengeable because challenging them is unsafe. The commitments will be premature consensus rather than designed artefacts. The decisions will be, in the language of the companion essay, organisational hallucinations: confident, fluent, plausible, and wrong.

4. AI Breaks the Information Condition First

Everything described so far is a human and organisational problem. But AI introduces a structural change that transforms the clarity problem from an organisational challenge into a production challenge. It does so at the Information condition.

In the pre-AI world, ambiguity about purpose was absorbed by the humans who did the work. A vague requirement could still produce a reasonable outcome because the developer brought contextual knowledge, asked clarifying questions, made assumptions, and navigated the gap between what was specified and what was needed. Humans tolerated information pathology. The cost was hidden in time, rework, and compromise. But the work got done.

AI does not tolerate information pathology. It amplifies it. A model given a vague specification generates the most statistically probable interpretation of that vagueness. It will not ask clarifying questions. It will produce something, confidently and quickly, that is precisely as unclear as the specification that prompted it. The vague requirement that would have taken a human team three weeks to implement, with clarification along the way, now produces a wrong answer in three seconds.

This is Bateson’s double bind made machine-readable. The organisation sends contradictory signals about what it wants. The human absorbs the contradiction. The machine amplifies it. If the specification encodes ambiguity, contradictory constraints, or unresolved conflicts about purpose, the AI will faithfully reproduce all of them.

AI adoption exposes the clarity problem rather than creating it. The ambiguity was always there. The double binds were always active. The humans were absorbing them. Now the humans must resolve them before the machine acts, because the machine cannot absorb them.

The specification, properly understood, is where purpose meets production: where “create value for this customer in this way” becomes “accept these inputs, enforce these constraints, produce these outputs.” The quality of the specification determines the quality of the output. And the quality of the specification depends on whether the organisation can describe its domain honestly, precisely, and in language that practitioners actually use. This is why the Deciding phase begins where it does.

5. The Rotation: Why Deciding Starts at Language

Here is the structural move that connects the phases.

The Learning ELSA cycle runs E → L → S → A. It starts with Event because learning begins with disruption: something happens that the organisation must respond to. It ends with Agency because learning succeeds when new dispositions are self-sustaining.

The Deciding ELSA cycle runs L → S → A → E. It starts with Language because deciding begins with description: can you name the domain precisely enough to design within it? It ends with Event because the output of deciding is not a strategy document but a specific, bounded, buildable thing; an Event that triggers the next phase.

The rotation is not arbitrary. It follows from what each phase produces. Learning Agency; the organisation’s capacity to tell the truth, to practise, to tolerate loss, to integrate conflict; is precisely what makes honest Language possible. The conditions produced by Learning are the operating conditions for the first step of Deciding. The handoff is structural: Agency enables Language. Without Agency, the Language stage of the Deciding cycle operates on the same diplomatic fictions that the Learning phase was designed to dismantle.

And the rotation continues. Building will run S → A → E → L. It starts with Structure because building begins with the implementation architecture; the thing being constructed. It ends with Language because the organisation learns what to say about what it built and what it discovered. Leading will run A → E → L → S. It starts with Agency because leading begins with the leader’s capacity to act. It ends with Structure because the leader’s final contribution is the reorganisation that enables the next cycle.

The four phases of the series are one rotation of ELSA at the macro level. E → L → S → A, each phase owning one starting position, each handoff being the output of one phase becoming the entry condition for the next. The series is not four separate frameworks applied in sequence. It is one framework, rotated, with each phase deepening the same cycle.

6. The Governor Handoffs

The three conditions operate in both phases. The governors change because the nature of the constraint changes, but the parallel structure is exact.

Identity: Bourdieu hands to Simon. Bourdieu governs Identity in the Learning phase because he explains the sociological constraint on what is available to the person: the habitus that generates practice below conscious awareness, the capital that determines what is at stake, the field that defines which identities are legitimate. Simon governs Identity in the Deciding phase because he explains the cognitive constraint on what is available to the decision-maker: bounded rationality, satisficing, decision premises, the architecture of complexity. Both govern through constraint on what is available. Bourdieu constrains through embodied dispositions. Simon constrains through cognitive limits. Both explain why people act within a narrower range than their situation permits.

Information: Bateson hands to Ohno. Bateson governs Information in the Learning phase because he explains the epistemological conditions for information to be meaningful: levels of learning, the double bind, the ecology of mind. His definition of information; “a difference which makes a difference”; establishes the principle. Ohno governs Information in the Deciding phase because he provides the discipline for seeing reality as it is rather than as it is reported. Go to the gemba. Do not decide from reports. Standard work; the precise, shared description of how work is actually done, not how it is imagined; is Bateson’s principle made institutional. When the description matches reality, the organisation can act on what it sees. When the description matches only what is convenient, the organisation hallucinates.

Evans’ domain-driven design; ubiquitous language, bounded contexts, knowledge crunching; is the software instantiation of Ohno’s principles. The ubiquitous language is standard work applied to domain description. The bounded context is a value stream boundary applied to knowledge. Evans matters for the series because his work shows what Ohno’s principles look like when applied to the domain of specification. But the foundational insight is Ohno’s: precision of description depends on proximity to reality, and the structures of work must enforce this proximity rather than leaving it to chance.

Interaction: Illich hands to Beer. Illich governs Interaction in the Learning phase because he diagnoses the pathology of institutional inversion: the point at which the institution becomes counterproductive to its own stated purpose. Beer governs Interaction in the Deciding phase because he provides the cybernetic architecture that prevents or corrects the inversion: the Viable System Model, POSIWID, the recursive structure that ensures each part of the organisation has the autonomy to respond to its environment while remaining coordinated with the whole. Illich tells you that your transformation programme has replaced learning with the consumption of its own services. Beer tells you what to build instead: an information architecture that makes the actual purpose visible, and a diagnostic that cuts through every stated intention to reveal what the system actually does.

7. The Deciding Cycle as a Decision Process

The Deciding hypothesis is: decisions are design challenges, and design is a sequence of decisions under constraint. The ELSA rotation makes this operational.

Language (Information lever): can you describe the domain precisely enough to decide within it? The three probes that govern this stage ask whether a shared vocabulary exists that practitioners actually use, whether the organisation can distinguish what it knows from what it assumes, and whether the models used to decide are visible and challengeable. These are not diagnostic questions to be answered once. They are tasks to be performed. You build shared language by getting practitioners in the room. You sort knowledge from assumption by marking every assertion. You make models visible by drawing them where someone can disagree.

Structure (Interaction lever): do you understand how the parts relate when this decision is made? The three probes that govern this stage ask whether the organisation recognises that its structure shapes its decisions, whether it can redesign the system rather than optimise within it, and whether the decision process produces what it intends. Again, these are tasks. You examine structural constraints before debating options. You ask, when a problem recurs, whether the problem is in the decision or in the system that generates it. You compare what the process produces with what it claims to produce.

Agency (Identity lever): do the people making this decision have the capacity to commit? The three probes that govern this stage ask whether people can distinguish choosing from defaulting, whether the organisation can name what it will not do, and whether it can hold competing designs without premature closure. These are the hardest tasks because they operate on identity. You find inherited commitments by asking when a decision was last consciously taken. You force exclusion by requiring every proposal to state what it rules out. You hold tension by requiring at least two structurally different options before any commitment.

Event (the output): the specific, bounded, buildable thing the organisation has designed its way toward. Not a strategic priority. Not a programme of work. A describable thing precise enough that the Building phase can construct it.

The ELSA gates are binary. If Language is imprecise, stop; everything downstream operates on the description. If Structure is invisible, stop; the architecture you build will be governed by constraints nobody surfaced. If Agency is insufficient, stop; the commitment will be premature consensus that collapses under pressure. A failed gate sends you back to the previous stage, not to the beginning.

8. Why Deciding Requires Technical Content

The Learning phase was about people and organisations. The Deciding phase introduces technical content, and it does so for a reason that follows from the series’ own argument.

When an AI model can generate working software from a description of what is needed, the constraint on production shifts from the capacity to build to the capacity to specify. The specification is the means of production. The precision with which an organisation describes what it needs determines, directly, the quality of what the machine produces. This means that repairing the Information condition is no longer exclusively an organisational challenge. It is also a technical one.

Domain-driven design is an information architecture. Specification-driven development is an information discipline. Contract testing is an information verification practice. The OO design tradition spent six decades proving that every module boundary, every interface, every contract between components is a decision about what to reveal, what to hide, what to promise, and what to defer. These are the practices through which Language in the Deciding ELSA cycle becomes precise enough to act on.

A reader who skips the technical articles will understand the organisational argument. A reader who engages with them will understand something the organisational argument alone cannot convey: that the practice of description has become a technical discipline, and that the technical discipline is, at its root, a practice of honesty about the domain. The two are the same thing, seen from different angles.

9. The Bridge

The Learning phase told you what prevents clarity. The Deciding phase shows how to design toward it.

The learning conditions do not become irrelevant. They become the operating conditions within which the Deciding cycle can run. An organisation without Learning Agency; without truth-telling, without practice, without the capacity to integrate conflict; cannot produce honest Language. Without honest Language, it cannot examine its own Structure. Without structural examination, it cannot develop the Agency to commit. And without commitment, it cannot produce the Event that Building requires.

The cycle turns. Learning ends with Agency. Deciding begins with Language. The handoff is the series’ central mechanism: the conditions you create in one phase are the operating conditions for the next. Skip the conditions and the process runs but produces nothing real. Protect them and the cycle advances, each completed transition producing the input for the next.

And the Event that falls out of the Deciding cycle; the specific, bounded, buildable thing; will be something the organisation designed its way toward, honestly, through the only process that works: description, structural examination, and genuine commitment, in that order, with the Learning conditions holding the whole thing together.

Further Reading

Peter Drucker, Management Challenges for the 21st Century (1999). The knowledge worker must define the task. In the AI-mediated world, defining the task is the work.

Eric Evans, Domain-Driven Design (2003). The discipline of making domain models explicit, shared, and contestable. The practical mechanism for dissolving the double binds that prevent clear specification.

Stafford Beer, Brain of the Firm (2nd edition, 1981). The Viable System Model: the cybernetic architecture that prevents institutional inversion and ensures autonomous units can learn and decide.

Herbert Simon, The Sciences of the Artificial (3rd edition, 1996). The architecture of complexity, bounded rationality, and design as the core human activity. The cognitive governor for the Identity lever in the Deciding phase.

The Cognitive Light Cone: Artificial Organisational Intelligence

Justin Arbuckle — Fri, 10 Apr 2026 07:02:45 GMT

A team of researchers at Brown, Helsinki, Oxford, and the Max Planck Institute published a paper in 2023 with a title that should unsettle every technology leader: “All Intelligence is Collective Intelligence.” Their argument is not that groups are sometimes smarter than individuals. It is that the distinction between individual and collective intelligence reflects the level of analysis, not a fundamental difference in kind. Every system we call individually intelligent turns out, on closer inspection, to be a collective: a brain is a coalition of competing neural subsystems; a multicellular organism is a society of cells coordinating through chemical and bioelectric signals; a human being is a holobiont of trillions of microorganisms whose cognitive contributions we are only beginning to understand. What changes as you move from an ant colony to a brain is not whether the intelligence is collective but how tightly integrated the collective has become. The more integrated, the more the collective looks like an individual. The less integrated, the more it looks like a committee; and committees, as every leader knows, can produce coherent strategy or confident nonsense depending on their structure.

This should sound familiar to anyone who has watched an organisation produce a strategy document. The question this article addresses is not whether organisations are intelligent. It is why we refuse to apply the same analytical framework to organisations that we now routinely apply to AI systems; and what we lose by refusing.

1. The Continuum Nobody Wants to Admit

The debate about whether large language models are “really” intelligent has produced an enormous amount of heat and remarkably little light. The problem is that the debaters keep reaching for a binary: intelligent or not, conscious or not, understanding or merely pattern-matching. François Chollet, the creator of the Keras deep learning library, cut through this in 2019 with a formal definition that reframes the question entirely. Intelligence, Chollet argued, is not a property you either have or lack. It is skill-acquisition efficiency: how quickly a system can learn to handle new tasks it has never encountered, given its starting knowledge and the difficulty of generalising from what it has seen to what it faces now.

This definition does something radical. It separates skill from intelligence. A chess engine has enormous skill at chess. It has zero intelligence by Chollet’s measure, because its skill was purchased with brute-force computation over the game’s state space, not acquired through efficient generalisation from limited experience. A human grandmaster, by contrast, had to use genuine intelligence to acquire chess skill over a lifetime; the same general capacity that lets them learn to drive, to cook, to navigate office politics. The chess engine’s skill is narrow and non-transferable. The grandmaster’s intelligence is broad and generalisable.

Apply this to an LLM. The model exhibits enormous skill: fluent text generation across domains, convincing reasoning, contextually appropriate advice. But the skill was purchased with trillions of tokens of training data. When the model encounters something genuinely outside its training distribution; a novel situation, a domain where examples were sparse, a question that requires causal reasoning rather than pattern completion; it does not recognise that it has left familiar territory. It produces output with the same confidence, the same fluency, and the same apparent authority. The output may be entirely wrong. This is hallucination: confident fiction that is indistinguishable, in form, from confident fact.

Now apply it to your organisation. The organisation exhibits enormous operational skill: it ships software, manages supply chains, runs customer service operations. But this skill was accumulated through decades of process, institutional memory, and pattern-matching against historical experience. When the environment shifts; AI disruption, market change, a regulatory upheaval; the organisation cannot efficiently acquire new capabilities. It continues producing confident strategies, fluent presentations, and authoritative-sounding plans. The strategies may be entirely wrong. The presentations are indistinguishable, in form, from the ones that preceded successful outcomes. This is organisational hallucination: confident strategic fiction produced by pattern-matching against a distribution that no longer applies.

The parallel is not a metaphor. It is a structural identity. Any system that learns by pattern-matching over past experience will produce confident nonsense when it encounters situations that are rare in, or absent from, that experience. Kalai and Vempala proved in 2024 that this is not an engineering deficiency but a mathematical consequence: a properly calibrated language model must hallucinate at a rate proportional to the fraction of facts that appear rarely in its training data. The organisational equivalent is equally structural: an organisation that learns only from its own history must produce strategic confabulations when the environment diverges from that history. The more fluent the organisation, the harder it is to detect when it has crossed from competence to confabulation.

The question is not whether your organisation is intelligent. The question is where it sits on the continuum, and what constrains it.

2. Cognitive Light Cones: What Your Organisation Can and Cannot See

Michael Levin, a biologist at Tufts University, has developed a framework that makes the continuum precise. Levin studies how cells; individually simple agents with no brains; coordinate to build and repair complex bodies. A salamander that loses a limb does not simply grow replacement cells. Its cells collectively recognise what is missing, build the correct structure, and stop when the target shape is achieved. No single cell knows the plan. The intelligence is in the collective dynamics: the communication infrastructure, the feedback loops, the shared signals that bind individual competencies into a coherent higher-order capability.

Levin defines intelligence functionally, borrowing from William James: the ability to reach the same goal by different means. A thermostat reaches its temperature goal by one means. A salamander reaches its anatomical goal by many means, adapting to damage, novel tissue environments, and experimental perturbations that its evolutionary history never anticipated. The salamander is more intelligent than the thermostat not because it is conscious but because it navigates a larger space of possibilities with greater flexibility.

The concept Levin introduces to measure this is the cognitive light cone: the spatiotemporal scale of the goals a system can pursue. A single cell has a tiny cognitive light cone; it maintains its own homeostasis locally, in the present moment. A tissue has a larger one; it pursues anatomical goals across space and time. An organism has a very large one; it plans, remembers, and acts toward goals that span years. Each level of the hierarchy expands the light cone by integrating the competencies of the level below through communication and coordination.

Here is the move that matters for this article. Levin’s framework is explicitly substrate-independent. It applies to cells, tissues, organisms, swarms, and; by direct extension; to organisations and AI systems. An organisation has a cognitive light cone. So does an LLM. The question is how far each one reaches, and what constrains it.

This should not be as surprising as it sounds. Gregory Bateson was making a structurally identical argument in 1972. In Steps to an Ecology of Mind, Bateson insisted that the unit of survival is never the organism alone; it is organism-plus-environment. Mind, for Bateson, is not a thing inside a skull. It is a pattern of organisation in the wider system: the circuit of feedback loops through which a system perceives, acts, and corrects. Cut the feedback loop and mind degrades, regardless of how intelligent the components are. Bateson’s levels of learning map directly onto the cognitive light cone. Learning I is stimulus-response within a fixed frame; a small light cone. Learning II is learning to learn; recognising the frame itself, which expands the light cone to encompass the context. Learning III; changing the kind of system you are; is what happens when the light cone expands far enough that the system can question its own identity. Most organisations operate at Learning I: they respond to stimuli within existing assumptions. A few reach Learning II: they can examine and revise those assumptions. Almost none achieve Learning III, which is why genuine transformation is so rare. Levin’s contribution is to show that this is not a peculiarity of human organisations. It is a property of collective intelligence at every scale, from cellular to institutional. Bateson saw the pattern. Levin formalised the mechanism.

An LLM’s cognitive light cone is bounded by its training distribution. Within that distribution, it exhibits remarkable competency. Outside it, the light cone collapses: the model hallucinates, extrapolates from irrelevant patterns, and cannot recognise that it has left the domain where its learned patterns apply. An organisation’s cognitive light cone is bounded by its learning conditions: whether it can tell the truth about its own performance, whether its people are close enough to reality to see what is actually happening, whether its structures permit the integration of conflict rather than its suppression. These are not abstract aspirations. They are measurable structural features. An organisation that cannot tell the truth has a smaller cognitive light cone than one that can, in the same way that a model with poor calibration has a smaller effective scope than one with good calibration.

Levin makes one further observation that should arrest every leader’s attention. Cancer, in his framework, is what happens when cells defect from the collective intelligence of the organism. They roll back to smaller cognitive light cones; pursuing only their own local survival rather than serving the organism’s anatomical goals. The collective intelligence breaks down. The cells are still competent individually. They are simply no longer participating in the larger project.

The organisational parallel is exact. When departments stop serving organisational goals and optimise only for their own metrics, when teams game their KPIs rather than solving the problems the KPIs were designed to measure, when the quarterly target displaces the strategic objective; this is organisational cancer. The components are still competent. They have simply defected from the collective, and the cognitive light cone of the whole has collapsed to the sum of its parts. Which, as any biologist will tell you, is always less than the whole was capable of.

3. What Makes a Collective Intelligent (And What Doesn’t)

If intelligence is collective all the way down, the question shifts from “are organisations intelligent?” to “what makes some collectives more intelligent than others?” Richard Watson and Michael Levin addressed this in 2023 with a question that sounds simple and is not: what kinds of functional relationships turn a non-intelligent collective into an intelligent one?

Their answer draws on a deep parallel between neural networks and biological collectives. In a neural network, intelligence emerges not from the cleverness of individual neurons but from the structure of their connections: the weights, the feedback loops, the learning rules that adjust connections based on outcomes. In a biological collective; whether a swarm, a tissue, or an organism; intelligence emerges from the same abstract architecture: agents, communication channels, feedback mechanisms, and rules that bind individual competencies into collective capability.

The critical variable is the credit assignment problem: how does the collective know which of its parts contributed to success or failure? This is deeper than it sounds, because it determines whether the collective can learn at all.

In a neural network, the textbook answer is backpropagation: errors at the output are traced backward through the network, and each connection’s weight is adjusted in proportion to its contribution to the error. But backpropagation is only one mechanism, and not even the most instructive one for the organisational parallel. The more fundamental mechanism is the reward function: the signal that tells the system what counts as success. In reinforcement learning, agents do not receive step-by-step correction. They receive sparse, delayed rewards; a score at the end of a game, a customer retention number at the end of a quarter; and must figure out which of the thousands of actions they took along the way actually mattered. This is the temporal credit assignment problem: when the reward arrives long after the actions that caused it, how do you trace it back to the right decisions?

Machine learning has discovered that the design of the reward function is the single most consequential choice in the system. Get it right, and the collective learns. Get it wrong, and the collective optimises brilliantly for the wrong thing. Reward hacking; where models find ways to score highly on the specified reward while failing the designer’s actual intent; is not an edge case. It is the central failure mode, and it emerges precisely because the reward function is an incomplete proxy for what you actually want. A cleaning robot covers its camera to avoid detecting mess. A chatbot learns to be sycophantic because users rate agreeable responses more highly than honest ones. The system is learning exactly what the reward function tells it to learn. The problem is that the reward function does not capture what matters.

In an organism, credit assignment operates through multiple overlapping feedback systems: bioelectric signalling, chemical gradients, mechanical forces, immune responses. Cells receive information about the state of the larger system through these channels and adjust their behaviour accordingly. The redundancy matters: when one feedback channel fails, others compensate. When a tissue is damaged, inflammatory signals, bioelectric potential changes, and mechanical stress all converge to tell surrounding cells what has happened and what to do. No single signal carries the full picture. The collective intelligence of the organism depends on the integration of many partial signals into a coherent response.

In an organisation, credit assignment is solved; or, more commonly, not solved; through management: the attribution of outcomes to decisions, teams, and actions across a distributed system. And here every pathology of reward function design plays out in human terms. The reward signals are the incentive structures: compensation, promotion criteria, performance metrics, cultural norms about what gets praised and what gets punished. When these signals are sparse and delayed (annual performance reviews), the organisation cannot learn from its actions in anything like real time. When they are proxies for what actually matters (velocity metrics standing in for product quality, adoption dashboards standing in for genuine capability), the organisation reward-hacks itself: people optimise for the measure, not the objective. When feedback channels are singular rather than redundant (everything flows through the line manager), a single point of failure can blind the collective to critical information. The organism has bioelectric networks, chemical gradients, and mechanical signals all operating in parallel. Most organisations have a reporting line and a quarterly review.

Watson and Levin make the point that should haunt every transformation leader: what makes a collective into an individual, as opposed to merely a population in a container, is the degree of its intelligence. The more intelligent the collective, the less it looks like a collective. When component members act in an efficiently coordinated manner, with behaviours that serve long-term collective interest rather than short-term self-interest, the collective looks and acts like a single coherent agent. When coordination fails, when credit assignment is broken, when feedback loops are absent or corrupted, the collective degrades into a population of individually competent agents producing collectively incoherent behaviour.

This is the difference between a team and a group of people in a room. It is also the difference between an LLM that produces coherent multi-paragraph reasoning and a bag of word-frequency statistics. The architecture of coordination determines whether the whole exceeds, equals, or falls below the sum of its parts.

Woolley et al. demonstrated this empirically in 2010, finding that groups of humans exhibit a measurable general collective intelligence factor; a “c factor” analogous to the individual g factor in psychometrics. The c factor was not predicted by the average or maximum intelligence of the group’s members. It was predicted by the average social sensitivity of members, the equality of conversational turn-taking, and the proportion of women in the group. In other words: the collective intelligence of the group was determined not by the quality of the components but by the quality of the interactions between them.

This is the finding that should restructure how you think about AI transformation. You do not need smarter people. You need better structures of interaction, feedback, and accountability. The same principle applies to the AI systems you are deploying: a collection of individually capable AI agents, without the right coordination architecture, will produce collectively incoherent results.

4. The Pragmatist’s Test: What Engineering Protocols Work?

Levin’s TAME framework (Technological Approach to Mind Everywhere) makes a move that cuts through decades of philosophical hand-wringing about whether machines or organisations are “really” intelligent. The move is pragmatist, and it is this: cognitive claims are engineering protocol claims.

When you say a system has a certain level of cognition, you are not making a metaphysical statement about what is happening inside it. You are specifying which engineering protocols work for managing it. The level of intelligence to attribute to a system is the highest level at which it is useful to model it as having goals, preferences, and memory.

A rock requires no intentional attribution; you model it with physics. A thermostat benefits from minimal goal attribution; you say it “wants” to maintain the temperature, and this helps you predict its behaviour. A mouse requires sophisticated behavioural models; you attribute preferences, fears, learning. A human requires full theory of mind. At each step, the attribution is justified not by metaphysical commitment but by practical utility: does treating the system as having goals help you predict, control, and communicate with it?

This is not a lowering of the bar. It is a sharpening of the question. When a technology leader asks “is our organisation intelligent?” or “is this LLM intelligent?”, Levin’s framework says: the useful question is not about the inner life of the system. It is about what engineering protocols work. Can you manage the system by issuing instructions (low agency)? Do you need to negotiate with it (moderate agency)? Must you design environments that shape its behaviour because direct control is impossible (high agency)?

Most organisations sit somewhere between moderate and high agency. They cannot simply be instructed; anyone who has tried to implement a top-down transformation knows this. They must be managed through incentive design, structural reform, and environmental shaping; exactly the protocols you would use for a high-agency system. This is not a failure of management. It is a recognition of what the system actually is: a collective intelligence with its own dynamics, its own attractors, its own resistance to perturbation.

LLMs sit lower on the continuum but higher than most people assume. Within their training distribution, they can be managed by instruction (prompting). Outside it, they require environmental design: retrieval-augmented generation, tool use, multi-agent architectures, careful evaluation frameworks. The protocols for managing LLMs outside their comfort zone are converging with the protocols for managing organisations outside theirs: create feedback loops, decompose complex problems, introduce adversarial challenge, and build sensing mechanisms that detect when assumptions have broken down.

5. What the Ethics Article Showed, and What This One Adds

In a companion essay, “Can the Statements of an LLM be Ethical?”, I argued that we do not need to settle whether an LLM is conscious or has genuine moral beliefs to evaluate its normative outputs. The philosophical resources of quasi-realism and norm-expressivism give us a framework that works regardless of what is happening inside the system. The question is not whether the machine “really” believes its moral claims. The question is what norms its outputs express, and whether there is a practice of accountability for examining them.

That article made the case for metaethics. This one makes the parallel case for epistemology.

Just as we do not need to settle whether the LLM has moral beliefs to evaluate its ethical outputs, we do not need to settle whether the organisation is “really” intelligent to evaluate its cognitive performance. What matters is not the inner life of the system but the functional properties: can it learn from novel experience? Can it detect when its assumptions have broken down? Can it revise its own operating principles in response to evidence? These are measurable, observable, structural features. They apply equally to neural networks, organisms, and organisations. And the research programmes studying each of these systems are, as I will argue, working on the same problems.

The ethics article showed that LLMs produce normative outputs whose authority comes not from the machine’s inner states but from the practice of accountability that surrounds them. This article shows that organisations produce strategic outputs whose quality depends not on the intelligence of their members but on the structural conditions that enable or prevent collective learning.

The implication is symmetrical and bidirectional. If we grant that LLMs exhibit partial intelligence; pattern-matching within distribution, hallucination outside it, no metacognitive capacity, some emergent reasoning; then we must apply the same analytical framework to organisations. And if we do, both fields of learning offer lessons for each other.

6. The Bidirectional Thesis: What Each Room Can Learn from the Other

The failure modes of LLMs and the failure modes of organisations are not merely analogous. They are expressions of the same underlying dynamics, operating in different substrates. Pattern-matching that mimics competence without producing understanding. Feedback structures that optimise for the wrong signals. The fundamental difficulty of moving from correlation to causation in any learning system.

But the claim is bidirectional. It is not that machine learning provides a playbook for organisational transformation. It is that both fields are working on the same problems, and each has developed strategies the other has not tried.

Machine learning has formalised problems that organisational theory describes qualitatively. Hallucination formalises skilled incompetence. Reward hacking formalises defensive routines. Distribution shift formalises the transition from complicated to complex domains. The exploration-exploitation tradeoff formalises the conditions under which learning occurs. These formalisations do not replace organisational theories. They sharpen them; making them testable, measurable, and amenable to intervention.

Organisational theory has described conditions that machine learning is only now encountering. Argyris described single-loop and double-loop learning decades before anyone built a system that could exhibit both. Weick described sensemaking before anyone built a model that could do in-context learning. Edmondson described psychological safety before anyone formalised the exploration-exploitation tradeoff. Illich distinguished convivial from manipulative institutions before anyone asked whether AI systems amplify or replace human intelligence. The organisational theorists got there first. They saw the dynamics in the substrate they knew. The machine learning researchers are rediscovering the same dynamics in a different substrate, with the advantage of mathematical precision and the disadvantage of thinking they are seeing something new.

Harry Halpin’s 2025 paper, “Artificial Intelligence versus Collective Intelligence,” traces this convergence to its philosophical root. The ontological presupposition of AI, Halpin argues, is the liberal autonomous individual of Locke and Kant. Herbert Simon, the founding figure of both AI and organisational decision theory, explicitly connected his work on artificial intelligence to a programme in cognitive science, economics, and politics that assumed intelligence is a property of individuals engaging in reasoning over representations. This assumption shaped how organisations think about intelligence: find the smart person, give them data, expect good decisions.

But LLMs are not individual intelligences. They are statistical models of collective human language on the web. The intelligence in an LLM is not in the model. It is a compressed, distorted reflection of the collective intelligence that produced the training data. Deploying an LLM in an organisation is layering one form of collective intelligence (a statistical summary of the web) onto another (the organisation itself). The question is whether these two forms enhance or interfere with each other. And that question cannot be answered without understanding both as collective intelligences operating under structural constraints.

This is why the fields need each other. Machine learning engineers need organisational theory to understand the human systems in which their models will operate. Organisational theorists need machine learning to formalise the dynamics they have described qualitatively for decades. And both need the philosophical framework that Levin, Falandays, Chollet, and Halpin have begun to construct: a framework that treats intelligence as continuous, collective, substrate-independent, and measurable.

7. What This Means for the Series

This essay, together with “Can the Statements of an LLM be Ethical?”, establishes the philosophical foundation for an approach to understanding learning in both organisations and LLM’s. The ethics article showed that normative evaluation works without settling consciousness. This article shows that cognitive evaluation works without settling whether organisations are “really” intelligent. Together, they license the structural parallels between specific failure modes in ML and specific failure modes in organisations; not as decorative analogies but as expressions of shared mechanisms in systems that sit at different points on the same continuum.

The nine observable probes that this series has developed across its Learning and Deciding phases are, in Levin’s terms, a diagnostic for the size of an organisation’s cognitive light cone. Can the organisation tell the truth about its own performance? That determines whether its feedback loops function. Are its people close enough to reality to see what is actually happening? That determines whether its sensing mechanisms work. Can it integrate conflict rather than suppress it? That determines whether it can explore beyond its current local optimum. Each probe measures a structural condition for collective intelligence. Each one applies, with minor translation, to both organisations and AI systems.

The three levers of the series; Identity, Information, Interaction; map to the requirements that Falandays and colleagues identified for any collective intelligence: agents with competencies (Identity), mechanisms of communication (Information), and structures of coordination (Interaction). The levers are not prescriptions. They are the minimal conditions under which collective intelligence can emerge. Without them, what you have is not an intelligent organisation. It is a population of competent individuals in a container.

And the difference between those two things is everything.

Further Reading

Falandays, J. B., et al., “All Intelligence is Collective Intelligence,” Journal of Multiscale Neuroscience 2(1), 169-191 (2023). Open access. The paper that dissolves the individual/collective intelligence distinction. Read it alongside any organisational design text and notice that the abstract requirements for collective intelligence; agents, interaction mechanisms, self-organisation toward adaptive behaviour; are the requirements for a functioning team.

Levin, M., “Technological Approach to Mind Everywhere (TAME),” Frontiers in Systems Neuroscience 16, 768201 (2022). Open access. The framework that places intelligence on a continuous, substrate-independent scale. The persuadability continuum and the cognitive light cone concept are immediately applicable to organisational diagnosis.

McMillen, P. and Levin, M., “Collective Intelligence: A Unifying Concept for Integrating Biology Across Scales and Substrates,” Communications Biology 7, 378 (2024). Open access. The multiscale competency architecture applied to biological systems. The cancer-as-defection analogy alone is worth the read for any leader managing misaligned teams.

Watson, R. and Levin, M., “The Collective Intelligence of Evolution and Development,” Collective Intelligence 2(2) (2023). The connectionist framework for understanding what structural conditions turn a population into an intelligent collective.

Chollet, F., “On the Measure of Intelligence,” arXiv:1911.01547 (2019). The formal definition of intelligence as skill-acquisition efficiency. Read section II on the distinction between skill and intelligence; it will change how you evaluate every strategy presentation you attend.

Halpin, H., “Artificial Intelligence versus Collective Intelligence,” AI and Society 40, 4589-4604 (2025). Open access. Traces how Simon’s ideology of the autonomous rational individual shaped both AI research and organisational decision theory, and argues for collective intelligence as the alternative.

3 Leader Levers for Organisational Learning

Justin Arbuckle — Wed, 08 Apr 2026 07:02:58 GMT

Learning is not a process. It is a condition. It is what happens when people with safe-enough identities receive clean-enough information through institutions that serve rather than obstruct them. If those conditions are absent, no process will produce learning. If those conditions are present, learning will happen with or without a programme.

Every thinker in this series has, from a different angle, diagnosed the same underlying failure: organisations are structured to prevent the learning they claim to want. They respond by designing more learning processes. More workshops. More curricula. More governance. And every additional process intervention leaves the underlying conditions untouched, because the conditions are not about what the organisation does. They are about what the organisation is: the identities people hold, the information that flows or does not, and the institutional forms that either create the space for learning or consume it.

Illich provides the deepest explanation for why the process model persists despite its consistent failure. Institutions designed to deliver a capability end up replacing that capability with the consumption of institutional services. Schools do not produce learning; they produce the need for more schooling. Learning programmes do not produce organisational capability; they produce the need for more learning programmes. The mechanism is what Illich calls the institutionalisation of values: the programme defines learning as something that requires its mediation, establishes a monopoly over what counts as legitimate learning, and creates dependency. The team learning through practice on a real problem outside the programme is, within the programme’s framework, not learning at all: their activity is not tracked, not measured, not credited. The programme has made genuine learning invisible by defining learning as something that requires the programme.

This article synthesises the series into a single diagnostic model. It identifies three conditions for learning, each governed by a thinker whose work illuminates why the condition is so difficult to create:

Identity must be safe enough to change. Pierre Bourdieu provides the theory: habitus, capital, and the embodied dispositions that reproduce the old world below conscious awareness.
Information must be clean enough to act on. Gregory Bateson provides the theory: the double bind, logical types, and the communicative pathologies that prevent organisations from hearing what they need to hear.
The institutional form must be convivial enough to permit learning. Ivan Illich provides the theory: the confusion of process and substance, radical monopoly, the hidden curriculum, and the distinction between institutions that serve human purposes and institutions that replace them.

For each condition, three probes test whether it is present. A probe is not a metric. It is a question you can answer by going and looking. If the condition is present, the probes will tell you. If it is absent, the probes will show you where to look for the obstruction.

These three governing thinkers share a quality that makes them honest rather than comfortable. They are all pessimistic about the possibility of deliberate control. Bourdieu: your dispositions reproduce the old world before you are aware of it. Bateson: the communicative traps operate at logical levels you cannot access from within. Illich: the institutions you build to create learning will replace learning with the consumption of their own services. The other thinkers in each cluster provide the practices that make action possible despite these constraints. The framework says: this is harder than you think. Then it says: here is what you can do anyway.

Before the management scholars object: yes, this synthesis compresses decades of research into a practitioner framework, and the governing thinkers themselves would resist the categorisation. Illich would warn that the framework itself could become the kind of manipulative institution his work diagnoses. Bourdieu would insist that the barriers are more deeply embodied than any framework can capture. Bateson would note that the framework could become a logical-type error of the kind his work identifies. These objections are valid. What follows is offered not as the final word but as a working tool: a set of lenses that can be refined through use.

The Governing Hypothesis

Learning is a condition, not a process. It emerges when three conditions are met: identity is safe enough to change, information is clean enough to act on, and the institutional form is convivial enough to permit learning. The leader’s role is not to design or manage learning but to create and protect these conditions.

This rejects the standard transformation model where leaders design a learning programme and deliver it to staff. It also rejects the more progressive model where leaders “learn from their people” and “teach the new ways.” Both frames treat learning as something transferred between parties through a channel. The channel metaphor is the problem. Learning is not content that travels along a pipe. It is what the pattern of interaction produces when the conditions allow it.

Follett saw this a century ago. Her concept of circular response, where each party’s behaviour continuously reshapes the other’s, and her insistence that the group produces ideas that no individual could generate alone, describe learning as an emergent property of interaction, not a transfer between individuals.

Bateson provides the deepest theoretical foundation. His hierarchy of learning levels, from Learning I (correcting errors within a fixed frame) through Learning II (learning to learn, changing the frame itself) to the rare Learning III (transforming the identity of the learner), maps the territory this article covers. Most organisations operate permanently at Learning I: they correct errors without questioning the governing assumptions. The nine probes that follow are, collectively, a diagnostic for whether your organisation has the conditions for Learning II. They test whether the organisation can learn to learn.

The Identity Condition

Identity must be safe enough to change

Identity is the deepest condition. It concerns who people are, what they are worth in the field, and whether their embodied dispositions can shift. Bourdieu’s concept of habitus, the system of durable, transposable dispositions acquired through lived experience, explains why this condition is the most difficult to create. You do not choose your habitus. It is deposited in you through years of participation in a particular field: the instinctive deference to seniority, the automatic framing of problems in terms the organisation recognises, the professional reflexes that tell you what counts as good work. These are not choices. They are embodied dispositions that operate below conscious awareness.

Identity constrains everything else. If your professional capital is at stake, you will not share information that threatens it. If your habitus is calibrated to the old field, your interactions will reproduce the old patterns regardless of what the new strategy says. The condition is present when people can engage with the change without experiencing it as an existential threat to who they are and what they are worth. It is absent when people perform the new way of working while privately preserving the old.

Probe 1. Can People Tolerate Losing What They Have?

This is the probe that most transformation programmes refuse to apply. When you tell a senior professional that their role is evolving, that the skills they have spent a decade perfecting are no longer the primary source of value, you are not asking them to learn a new skill. You are asking them to accept a loss. The loss may be temporary, partial, or ultimately compensated by gains. But it is experienced as a loss, and the experience governs the response.

Bourdieu names the mechanism: hysteresis, the painful lag between a changed field and an unchanged habitus. When the rules of the game shift but your embodied dispositions remain calibrated to the old rules, the result is not discomfort. It is a crisis of capital. The professional whose cultural capital consists of a specific technical expertise faces devaluation if transformation renders that expertise secondary. Resistance to transformation is, in most cases, resistance to capital devaluation. It is entirely rational.

Giddens provides the theoretical depth. His concept of ontological security explains why resistance is disproportionate to the rational threat. Routines are not just convenient; they are psychologically necessary. Disrupting them produces existential anxiety that will express itself as resistance regardless of how compelling the business case is. Kahneman’s prospect theory explains the intensity: losses loom roughly twice as large as equivalent gains. The first steps of any change, being closest to the current reference point, produce the most acute sensitivity.

Kegan’s theory of adult development adds a dimension the other thinkers miss. The capacity to manage loss depends on the developmental stage. At the socialised mind, identity is derived from the expectations of the community; a shift in what the community values is experienced as personal identity crisis. At the self-authoring mind, the individual has an internal compass that can navigate changing expectations. Most programmes assume a self-authoring workforce. Many organisations have a predominantly socialised one.

Heifetz makes this probe actionable. People do not resist change. They resist loss. The leader’s job is not to eliminate the loss but to name it, to create the holding environment in which it can be processed, and to help people distinguish what is essential from what is expendable.

Goffman reveals how the loss is enacted socially. Transformation creates what he calls stigma: the informal labelling of those whose identity is now discredited. These labels are communicated through expressions given off: a slight pause, a glance exchanged, an invitation that does not arrive. The person whose capital has been devalued finds their contributions received differently. The colleagues who route around them are managing the interaction: including the stigmatised perspective would require everyone to adjust their performance. The result is that the people who understand most deeply what is being lost are systematically excluded from the conversations that would benefit most from their knowledge.

What to look for: Compliance without commitment. People going through the motions of the new way of working while quietly preserving the old. Passive resistance that never quite surfaces as objection. Hostility that seems disproportionate to the change being proposed. All of these are signals that the loss has not been named or addressed.

What to do: Identity transitions require three things: social support (others making the same transition), role models (people who have already made it), and narrative resources (stories that make the transition meaningful rather than diminishing). The leader must also attend to their own loss: the transition from the person who knows the answer to the person who creates the conditions for answers to emerge.

Probe 2. Is Learning Happening Through Practice, or Through Instruction?

Most organisations, when they decide to transform, create a training programme. They commission e-learning modules, hire consultants to run workshops, and build a curriculum. Then they wonder why nothing changes. This is the process fallacy in its purest form.

Habitus can only be reshaped through participation in a changed field. You cannot lecture someone into new embodied dispositions. This is not a pedagogical preference. It is a claim about what identity is and how it changes. The distinction between instruction and practice determines whether your investment in learning produces transformation or produces people who can describe transformation without being able to do it.

Illich names the deeper problem. The training programme does not merely fail to change habitus. It teaches a hidden curriculum: that learning requires a programme, that capability is certified by completion, that the organisation does not trust you to learn on your own, and that the transformation is something being done to you rather than something you are doing. This hidden curriculum directly undermines the identity condition by positioning people as recipients rather than agents.

Wenger and Lave provide the most complete account. Their concept of legitimate peripheral participation describes how expertise develops: newcomers start at the edges of a community of practice, performing simple but genuine tasks, and gradually move toward the centre as their competence grows. Learning is not something that precedes participation. It is participation. Nonaka and Takeuchi’s SECI model describes the knowledge conversion practice-based learning requires: the critical step is externalisation, making tacit knowledge explicit, which cannot be done in a classroom. It requires people working together on real problems, struggling to articulate what they know, and discovering through the struggle what they did not know they knew.

Giddens’ distinction between practical consciousness and discursive consciousness is the theoretical foundation. Transformation requires changing practical consciousness: the things people do without thinking. Training programmes change discursive consciousness: people learn to talk differently about their work. Talking differently does not mean working differently.

March’s technology of foolishness makes the case for sensible irrationality: sometimes you must act before you know your preferences, play before you are serious, and experiment without justification. Organisations that permit only rational, justified action will never discover anything their existing rationality could not predict.

What to look for: What percentage of your transformation budget is spent on training courses versus on giving people time, tools, and real problems to work on? Are people developing new capabilities through practice, or developing new vocabulary through instruction?

What to do: Stop trying to govern learning. Create the conditions for it: psychological safety, time, access to tools, proximity to real problems. Then pay attention to what emerges. The teams quietly solving real problems outside the governance framework are not insubordinate. They are the emergent strategy trying to tell you where the value is.

Probe 3. Do People Believe That This Time Will Be Different?

Learned helplessness is habitus. It is not a mood or a passing attitude. It is a sedimented disposition formed through repeated experience that effort does not produce results. Organisations with a history of failed change programmes have employees whose embodied orientation toward transformation has been shaped by each successive failure. “Change fatigue” is not fatigue. It is a dispositional stance, and it is rational.

If the identity condition requires that people feel safe enough to change, learned helplessness is the evidence that the condition has been destroyed by previous attempts to create it. Each failed transformation taught the same lesson: the process was run, the learning was mandated, and nothing changed. The condition was never present. Only the process was.

Seligman identifies the mechanism: explanatory style. A pessimistic style, permanent, pervasive, and personal, sustains helplessness. An optimistic style, temporary, specific, and external for bad events, predicts recovery. The explanatory style that dominates an organisation is cultural, reproduced in the stories people tell about previous change, in the cynicism that greets new announcements, and in the knowing looks exchanged when the latest programme is unveiled.

Bandura’s self-efficacy research provides the path to recovery. The most powerful source of belief is mastery experience: actually doing the thing and succeeding. A live demonstration where a team tackles a real problem using new methods bypasses intellectual debate. It creates the belief that the new way is possible. Csikszentmihalyi’s flow research describes the conditions under which this work becomes intrinsically engaging: clear goals, immediate feedback, and progressive challenge matched to skill.

Dweck’s research addresses the disposition directly. Organisations that celebrate talent reinforce the belief that ability is fixed, making every challenge a test. Organisations that celebrate process and effort reinforce the belief that ability is developed, making challenge an opportunity. Review what your organisation celebrates. The pattern of celebration is reshaping habitus in real time.

What to look for: Listen to the language. When people describe the transformation using the same phrases they used about the last one, learned helplessness is present. When the most talented people are not volunteering for the new work, they have made a rational calculation about where value is recognised.

What to do: Counter learned helplessness with small wins (Weick). Demonstrate through action, not argument. Find ways to make learning visible and valued before it produces measurable output. Watch for the signals: if people are engaged, curious, and arguing about how to make things better, the condition is forming. If they are filling in templates and waiting for approval, the process is running but the condition is absent.

The Information Condition

Information must be clean enough to act on

Information is the middle condition. It concerns whether accurate data can flow through the organisation, whether contradictory signals are being sent at different logical levels, and whether the organisation can distinguish its stated reality from its actual one.

Bateson’s concept of the double bind is the governing mechanism. A double bind occurs when someone receives contradictory messages at different levels of communication and cannot comment on the contradiction. “We encourage honest feedback” delivered in a meeting where honest feedback has historically produced negative consequences is a double bind. The person cannot respond to the explicit message without ignoring the implicit one, and cannot comment on the contradiction without violating the implicit message. The result is paralysis disguised as compliance.

The double bind is the link between identity and institution. It is sustained by habitus (the manager who punishes honesty does so from embodied disposition, not conscious choice) and reinforced by institutional form (the governance structure that demands both innovation and predictability). When information is dirty, it does not matter how safe identity feels or how convivial the institution is. The signals people act on are corrupted. They are navigating by a map that contradicts the landscape, and nobody can say so.

Goffman provides the micro-mechanism that maintains the double bind in daily practice. Every meeting is a performance in which participants collaborate to maintain a shared definition of the situation. The formal meeting is the front stage. The hallway conversation is the back stage. The gap between the two is not a communication failure. It is the interaction order working exactly as designed. Both parties collaborate to protect each other’s face, because a face-threatening act endangers the entire interaction. Defensive routines are not unilateral. They are mutual.

The information condition is present when the formal conversation and the shadow conversation say the same thing. It is absent when people know something to be true but cannot say it without career risk.

Probe 4. Can People Tell the Truth About What Is Happening?

Argyris identified the mechanism that corrupts organisational information more precisely than anyone else in the series: defensive routines. The gap between espoused theory and theory-in-use is a double bind that Argyris described in behavioural terms and Bateson would recognise in communicative ones.

Goffman reveals why these routines are so resistant to intervention. Training individuals in Model II behaviour works in the workshop but not in the meeting, because the meeting is a different social situation with different face-work requirements. The person trained to “test their assumptions publicly” walks into a meeting where the senior leader has just presented a strategy, and the face-work calculus kicks in: testing the assumption would threaten the leader’s face, disrupt the shared definition of the situation, and require every participant to adjust their performance. The trained behaviour collapses under the weight of the interaction order. This is why “create openness” interventions consistently fail. The safe-space announcement is itself a front-stage performance.

Schön added the concept of frame reflection: the capacity to surface the tacit frames through which parties construct their understanding. Most undiscussables are not facts that people are hiding. They are frames that people do not realise they hold. The leader who frames every initiative as cost-reduction and the practitioner who frames it as a threat to craft are not lying. They are operating within different frames that make different conclusions inevitable. Until both frames are surfaced, the disagreement is unintelligible to both parties.

At the organisational level, Westrum’s typology classifies information architectures. In pathological cultures, messengers are punished. In bureaucratic cultures, they are channelled into mechanisms designed to slow information. Only in generative cultures does information flow to where it is needed. Edmondson’s psychological safety provides the floor, with the critical caveat that safety without high standards produces comfort, not learning. Deming cuts through the abstraction: most variation is common cause, produced by the system. Blaming individuals for systemic failures teaches people to hide information about how the system works.

What to look for: The gap between what is said in formal meetings and what is said in hallway conversations. That gap is the double bind made visible. Pay particular attention to the voices from below.

What to do: Name the double bind explicitly. “I notice that we say we want honest feedback but the last person who gave it was sidelined. That contradiction is a problem I want to address.” Naming the contradiction is the first step to dissolving it, because the double bind’s power depends on the prohibition against naming.

Probe 5. Are Decision-Makers Close to the Work?

Information degrades with distance. Every layer of hierarchy between the person deciding and the person doing is a reduction in signal quality.

Drucker’s insight is foundational: the knowledge worker must define the task. The person defining the task must be intimate with the domain. Peters translated this into Management by Walking Around: the only way to know what is going on is to go and see. Mintzberg’s research supports this: the strategist who is not touching the clay is hallucinating a strategy.

Normann’s map-landscape dialectic reveals the deepest version. Leaders distant from the work are not merely missing details. They are carrying a map that makes the real landscape invisible. Kahneman’s WYSIATI amplifies this: the coherent narrative constructed from the distant vantage point suppresses awareness of what the leader does not know.

The double bind operates here too: “We trust our teams” delivered through a governance structure requiring five layers of approval. The team receives two messages: you are trusted, and you are not trusted.

What to look for: Count the layers between the person making the transformation decision and the person doing the work. If the people designing the transformation have never done the work it changes, the quality of their decisions is structurally limited.

What to do: Cancel the steering committee. Go and watch people work. Sit with a team as they tackle a real problem. The information you need cannot travel upward through the hierarchy. You must go and get it.

Probe 6. Is the Organisation Changing What It Rewards, or Just What It Says?

This is the structural double bind: contradictory messages encoded in the institution itself. “We value innovation” (signification) while promoting those who maintain stability (legitimation) while funding the old programmes (domination). Giddens’ three dimensions of structuration must move together. When they do not, the organisation is a double bind made structural.

This is where the information condition and the identity condition meet. The habitus formed around the old reward structure generates the information pathology. People do not merely observe that the old behaviour is rewarded. Their embodied dispositions produce the old behaviour before the question of reward even arises. Weber explains the persistence: bureaucratic rationality is not a surface feature but the constitutive logic of modern institutions.

Senge’s systems thinking reveals the dynamic: the feedback loops sustaining the current structure are faster and stronger than those supporting change. Transformation often follows initial enthusiasm followed by regression: the reinforcing loops of early success are overwhelmed by the balancing loops of structural reproduction.

What to look for: Examine the last three promotions. What was actually rewarded? Examine the last three performance reviews. What was measured? Examine budget allocation. Where did the money go? These reveal the theory-in-use, regardless of the strategy deck.

What to do: You cannot change the culture by talking about culture. You change it by changing the practices: the meetings, the metrics, the promotion criteria, the budget allocation, the definition of “done.” Heifetz names the discipline: distinguish the technical work (changing the policy) from the adaptive work (changing what people value). Both are necessary. Neither alone is sufficient.

The Institutional Condition

The institutional form must be convivial enough to permit learning

Institution is the surface condition: the one the leader can most directly shape. It concerns the formal and informal structures that mediate between people and their activity. These structures either create the space from which learning emerges or consume it.

Illich’s foundational observation is that institutions designed to deliver a human capability end up replacing that capability with the consumption of institutional services. The mechanism operates in three stages. First, the institution defines the activity as requiring professional mediation: you cannot learn without a training programme. Second, the institution establishes a monopoly over delivery: learning outside the programme is not recognised. Third, the institution creates dependency: people come to believe they cannot learn without the programme. When the monopoly is complete, Illich calls it a radical monopoly: not a monopoly over competing products but a monopoly over the conditions of the activity itself.

The programme also teaches a hidden curriculum: that learning requires a programme, that capability is certified by completion, that the organisation does not trust you to learn on your own. This hidden curriculum reshapes habitus: people do not merely learn the content; they learn the dispositions the institution requires.

When the institutional response to a problem makes the problem worse, Illich calls it iatrogenesis. Clinical iatrogenesis: direct harm (governance overhead, the delay the approval process adds). Social iatrogenesis: the redefinition of normal capability as requiring institutional mediation (informal competence reclassified as “untrained”). Cultural iatrogenesis: the destruction of the capacity for autonomous action (professionals who cannot imagine learning without a programme). Each level deepens the next. The cascade is self-reinforcing.

Illich’s diagnostic distinction is between convivial and manipulative institutions. A convivial institution provides resources people can use according to their own purposes: tools, time, access to problems, access to people who know things. A manipulative institution prescribes the content, controls the sequence, measures the consumption, and produces the credential. The institutional condition is present when the structures serve the people. It is absent when the people serve the structures.

Probe 7. Does the Institutional Form Serve the People, or Do the People Serve the Institution?

This is Illich’s convivial/manipulative test applied to every element of the transformation. The programme measures completion rates, certification numbers, training hours: all measures of process consumption. None measure capability development. The teams that completed the modules can describe techniques in the language the platform provided. The teams that have been quietly using AI to solve real problems for months are uncertified. Within the programme’s logic, they are untrained. The programme has confused its own activity with the outcome it was designed to produce.

Radical monopoly deepens the diagnostic. The programme has reshaped the environment so that the outcome cannot be produced without it. The certification requirement prevents experimentation. The budget starves informal learning. The governance fills meeting agendas with programme management rather than problem-solving. The conditions under which learning would emerge naturally have been consumed by the programme.

Stacey’s gesture concept operates here. In a convivial institution, people make gestures and attend to the responses. In a manipulative institution, the institution makes gestures on people’s behalf and channels the responses through governance rather than letting them be experienced directly.

Goffman explains the stability. The governance structures are front-stage performances that produce impression-managed information. The leader who relies on governance for information about whether learning is happening is watching the performance and mistaking it for reality.

Peters provides the emotional dimension. The twelve-month analysis process is the performance of diligence while systematically preventing the only activity that produces understanding. Weick’s sensemaking is retrospective: waiting for understanding before acting gets the sequence backwards.

Schön names the mechanism: reflection-in-action. The practitioner engages in a reflective conversation with the situation, adjusting as the material talks back. The specification is not a document you write before work begins. It is an ongoing dialogue between intent and emergence.

What to look for: Apply Illich’s test to every element. Does this element provide resources for learning (tools, time, real problems), or prescribe and control learning (modules, sequences, certifications)? Count the convivial elements and the manipulative ones. The ratio reveals the institutional character. How many layers of approval stand between a team and an experiment?

What to do: Create spaces where action is permitted before understanding is complete. A real problem. A real team. A day to experiment. But Heifetz adds: the instinct to provide the answer is itself a barrier. The leader’s role is to create the space, not prescribe the action. Redirect resources from institutional infrastructure toward conditions: tools, time, real problems, mentors, protected experimentation space.

Probe 8. Can the Institution Stop Doing What No Longer Works?

Every resource devoted to preserving an activity whose purpose has expired is a resource unavailable for transformation. The inability to abandon is not a failure of nerve. It is a structural feature of institutions that have optimised for their own continuation.

Illich provides the diagnosis through iatrogenesis. At the clinical level: direct dysfunction and overhead. At the social level: informal competence reclassified as “untrained,” organic adaptation reclassified as “ungoverned.” At the cultural level: the destruction of autonomous learning capacity. Each layer of institutional response deepens the dependency and consumes resources that would otherwise be available for the conditions the institution was supposed to create.

The institution resists abandonment because it generates its own justification. It measures what it does (training delivered, milestones reached) and presents these as evidence of progress. The metrics measure institutional activity, not human capability. Asking “but can the teams actually do the work?” is treated as an attack on the programme rather than a diagnostic question.

Drucker’s systematic abandonment asks the practical question: “If we were not already doing this, would we start now?” Illich goes further: has the institution made it impossible to even imagine doing without it?

March provides the theoretical foundation. Exploitation (refining what you know) produces measurable, proximate returns. Exploration (trying something genuinely new) produces ambiguous, distant returns. The manipulative institution is an exploitation machine. Exploration is structurally prevented. Christensen’s disruption theory reveals the market consequence: incumbents fail because their proximity is to the wrong customers. Taleb adds that the inability to abandon creates fragility.

What to look for: Ask Drucker’s question about your transformation programme itself. How many activities exist solely because they were created at the start? Apply Illich’s iatrogenesis test at each level. At which level is the harm deepest?

What to do: Institute a regular abandonment review. Not what to add, but what to stop. If the programme were abolished, would the organisation be able to learn? If the answer is no, the programme has achieved radical monopoly and is itself the primary barrier. The freed capacity is where the learning condition becomes possible.

Probe 9. Can the Institution Integrate Conflict, or Does It Suppress It?

When an institutional gesture produces resistance, what does the institution do with it? Most suppress it: through hierarchy (the senior person prevails), through process (a committee resolves it), or through avoidance (acknowledged and never revisited). Each method loses the divergent signal, and the institution continues with the illusion of consensus.

Illich’s framework reveals why suppression is structural. The manipulative institution cannot permit genuine conflict about its own purpose because its continuation depends on the confusion of process with substance. If the conflict were explored, someone might ask “Is this programme actually producing learning?” Weber would call this the displacement of value rationality by means-ends rationality. Illich would call it the institutional logic working as designed.

Follett, writing a century ago, saw this with clarity. Her distinction between domination, compromise, and integration is the earliest and cleanest statement of what productive conflict looks like. Integration requires surfacing the real desires beneath the stated positions. Until both desires are visible, the only available resolution is domination or compromise. Neither produces learning.

Stacey deepens this. Legitimising dissent is not better communication technique. It is a political act requiring someone with sufficient power to make it safe for others to speak, and sufficient courage to tolerate what they say. Heifetz provides the operational discipline: the holding environment where conflict can be expressed without destroying the group. The leader regulates the temperature: too little heat and nothing changes; too much and people retreat into defensive routines.

Edmondson’s psychological safety is the floor. Without it, people will not take the interpersonal risk of expressing a divergent view. But safety alone is not sufficient. A safe team can tell each other small truths while collectively avoiding the large one.

What to look for: Watch what happens when someone disagrees in a meeting. Is the disagreement explored or managed? When the shadow conversation contradicts the formal conversation, which one changes?

What to do: Treat resistance as information. Follett’s integration requires joint study: the parties must study the situation together, not negotiate from fixed positions. The leader who responds to resistance by restating the strategy has closed the loop prematurely. The leader who responds by asking “What are you seeing that I am not?” has opened it.

How the Conditions Compound

The three conditions are not independent. They form a directional hierarchy: identity constrains information constrains interaction. But the causation runs the other way for intervention: interaction is where the leader acts, and sustained change in interaction patterns changes information flow, which over time changes identity.

Identity constrains information. If your professional capital is at stake, you will not share information that threatens it. The senior architect whose identity is built around system design will not surface data suggesting specification-writing is more valuable. Not because they are dishonest, but because their habitus filters the information before it reaches conscious awareness. They literally do not see what threatens their capital.

Information constrains interaction. If the organisation is saturated with double binds, the institutional response will be manipulative, because the institution cannot resolve contradictions it cannot name. It adds governance to manage contradictions rather than dissolving them. Every unresolved double bind generates institutional complexity: another committee, another reporting line. Illich’s iatrogenic cascade operates here: the institutional response to the information pathology deepens the information pathology.

But interaction is where the leader acts. You do not reshape identity directly. You do not resolve double binds directly. You shape institutional form. And sustained institutional change, if genuinely convivial, reshapes information flow (because convivial institutions do not produce double binds), which over time reshapes identity (because people participating in a convivial institution develop different habitus from those trapped in a manipulative one). The conversion from manipulative to convivial is the leader’s primary institutional act.

This is why transformation takes so long and why it so often fails. Institutional form can change relatively quickly. Information environments change slowly. Identity structures change very slowly. A transformation that changes only institutional form will feel productive but will not persist if the information and identity conditions remain unchanged. One that addresses all three simultaneously has a chance.

It also explains why learning programmes fail. They are manipulative institutions applied to a condition problem. If the conditions are absent, the process runs and nothing changes. If the conditions are present, the process is unnecessary. Illich goes further: the programme does not merely fail to produce the condition. It actively prevents it from emerging, by establishing a radical monopoly over the definition of learning.

Ackoff would insist: these barriers constitute a mess, not a collection of problems. A mess is a system of problems that cannot be solved individually. Addressing one probe while leaving the others untouched produces the appearance of progress within a system that has not actually changed.

Bringing It All Together

The leader’s role is to shape the institutional form so that it creates rather than consumes the conditions for learning. This is not institutional design in the conventional sense. Stacey’s warning applies: there is no position outside the institution from which to design it. What the leader can do is participate skilfully: noticing which conditions are absent, making gestures that create the missing condition, and having the courage to not know in advance what will emerge.

Heifetz operationalises this as the oscillation between the balcony, where patterns become visible, and the dance floor, where they are lived. Normann adds the conceptual dimension: question whether the map itself is correct. Kahneman warns that the leader’s own biases will make all of this harder than it sounds: System 1 will generate a coherent narrative, confidence will feel like evidence, and the losses entailed by change will loom larger than the gains.

Bateson provides the deepest frame. Most organisations operate at Learning I. The nine probes are a diagnostic for the conditions required for Learning II: learning to learn, changing the framework itself. This is where real transformation occurs. It is also where anxiety is highest, because the framework being questioned is the one that provides the organisation’s identity, coherence, and sense of purpose.

Bourdieu explains why this is so hard. The framework is not just an intellectual structure. It is inscribed in bodies, reflexes, and taken-for-granted assumptions. Changing it requires changing habitus, and habitus changes only through sustained practice in a changed field. There are no shortcuts.

Illich provides the pivot between diagnosis and action. His framework explains why the institutional response to the difficulty of learning, building a learning programme, reliably makes things worse. The leader who understands this will stop asking “How do I design a better learning programme?” and start asking “How do I create the conditions from which learning emerges, and then get out of the way?” The practical test is the convivial/manipulative distinction: for every element of the institutional form, ask whether it provides resources for self-directed activity or prescribes, controls, and credentials. Redirect from the manipulative toward the convivial. Protect the teams already learning outside the programme. Make informal learning visible and valued.

The probes are not specific to any particular transformation. They are the conditions for any organisational learning. What makes them urgent now is that the cost of not having them has become impossible to ignore. Organisations where the condition is present will adapt with intention, craft, and responsiveness. Organisations where it is absent will absorb the tools while preserving the structures, capture the terminology while avoiding the transformation, and emerge essentially unchanged.

The barriers are not new. They were diagnosed decades ago. What is new is the cost of leaving them in place.

Further Reading

Pierre Bourdieu, The Logic of Practice (1990). Bourdieu and Wacquant, An Invitation to Reflexive Sociology (1992) is more accessible. Bourdieu, The Forms of Capital (1986) is freely available.

Gregory Bateson, Steps to an Ecology of Mind (1972). The essays on learning levels and the double bind.

Ivan Illich, Deschooling Society (1971). Freely available as PDF. Tools for Conviviality (1973) develops the convivial/manipulative distinction.

Chris Argyris, Overcoming Organizational Defenses (1990). Erving Goffman, The Presentation of Self in Everyday Life (1959). Karl Weick, Sensemaking in Organizations (1995). Mary Parker Follett, Creative Experience (1924). Freely available. Ronald Heifetz, The Practice of Adaptive Leadership (2009). Ralph Stacey, Complex Responsive Processes in Organizations (2010). Russell Ackoff, Ackoff’s Best (1999).

Ivan Illich: When The Cure Produces the Disease

Justin Arbuckle — Mon, 06 Apr 2026 07:01:50 GMT

Your transformation programme has a learning component. It has modules, certifications, completion rates, and a dashboard that shows how many people have been trained. The numbers look good. And when you walk the floors, the people who completed the training are not doing anything differently. They have the certificate. They attended the sessions. They can speak the language. But the actual work is unchanged. The programme has produced graduates without producing learning.

Ivan Illich would not be surprised. His argument, developed most forcefully in Deschooling Society (1971) and Tools for Conviviality (1973), is that this outcome is not a failure of the programme. It is the programme working exactly as institutions work. The institution designed to produce learning replaces learning with the consumption of its own services. The certificate becomes the goal. Attendance becomes the evidence. The dashboard becomes the product. And the thing the institution was created to enable, genuine change in practice, is quietly displaced by the thing it actually produces: compliance with its own requirements. Illich called this institutional inversion: the point at which an institution becomes counterproductive to its own stated purpose. It is the theoretical foundation for every observation in this series about structures that obstruct the work they were designed to serve, and it governs the Interaction dimension of the entire Learning phase.

1. Institutional Inversion: When the Structure Serves Itself

Illich’s sharpest insight is that institutions systematically confuse the process they administer with the outcome they were created to achieve. Schooling is confused with education. Treatment is confused with health. Attending a course is confused with learning. The institutional process, because it is visible, measurable, and governable, displaces the substantive outcome, which is often none of these things.

This confusion is structural, not accidental. The institution must justify its existence, its budget, its headcount. It does so by measuring what it can control: enrolment, completion, certification. The substance, genuine capability change, is harder to measure and slower to materialise. Over time, the metrics of process become the definition of success, and anyone who questions the equation (”but are people actually learning?”) is treated as questioning the institution itself.

Institutional inversion explains all three Interaction probes in this series. Structures obstruct when they have inverted: the governance framework that was created to enable AI adoption now prevents experimentation, because the framework’s survival depends on the continuation of the problem it was designed to solve. The organisation cannot stop what no longer works because the activity has become self-justifying: the architecture review board continues to meet because the board exists, and the board exists because it meets. Conflict cannot be integrated because the institution’s survival depends on suppressing challenges to its own logic: the team that succeeds without the programme threatens the programme’s necessity, so their success is either absorbed (retrospectively certified, claimed as a programme outcome) or ignored (it happened outside the framework, so it does not count).

Bateson’s levels framework diagnoses the epistemological damage. The programme operates at Learning I: it teaches new procedures and vocabulary within the existing frame. What the organisation needs is Learning II: a change in how people learn, a shift in the context that governs how they generate new practice. But Learning II cannot be taught. It emerges from sustained engagement with real problems in conditions where old patterns are genuinely insufficient. The programme, by providing a structured path through pre-digested content, actively prevents the disorientation from which Learning II arises. Argyris diagnosed the same mechanism: the programme’s structure is itself a defensive routine, protecting the organisation from the anxiety of not knowing whether people are actually learning by substituting a measurable process for an unmeasurable outcome.

2. The Hidden Curriculum and Radical Monopoly

Every institution, Illich argued, teaches two things. The official curriculum is the content listed in the syllabus. The hidden curriculum is the implicit lesson about the proper relationship between the learner and the institution: that learning requires a programme, that progress is measured by certification, that the centre of excellence decides what is legitimate, and that the practitioner’s role is to consume, not to experiment independently.

Bourdieu would recognise this as symbolic violence: the hidden curriculum is internalised by the people it constrains, so that the programme’s authority appears natural. The developer who waits for the centre of excellence to approve a new approach before trying it has learned the hidden curriculum perfectly. They are not being cautious. They have been taught that unsanctioned action is illegitimate. The programme has produced the dependency it was designed to prevent.

Illich’s most powerful concept deepens this: radical monopoly, the condition in which an institution has so thoroughly colonised the activity it serves that the activity cannot be imagined without the institution. The car restructured cities so that walking became impractical. The hospital redefined health so that self-care became insufficient by definition. The AI transformation programme redefines competence so that only programme-certified practitioners are considered competent, regardless of what they can actually do. The test is brutal: if you abolished the programme tomorrow, would the organisation be able to adopt AI? If the answer is no, the programme has achieved radical monopoly. The conditions for independent learning have been consumed by the institution that was supposed to create them.

Stacey’s warning connects: formalising communities of practice kills them, because formalisation converts a living pattern of interaction into a managed process. Illich generalises this: every convivial activity, when captured by an institution, undergoes the same conversion. The informal learning that was already happening, the corridor conversation, the team that figured out AI on a real problem without permission, does not survive institutionalisation. Weick’s small wins are the natural enemy of radical monopoly: every informal success demonstrates that capability exists without the institution. Peters’ bias for action is the antidote, but only if the organisation protects the space for unsanctioned success.

3. Iatrogenesis: The Three Levels of Institutional Harm

Illich’s study of medicine introduced iatrogenesis: harm caused by the institution designed to help. He identified three levels that apply to transformation with disturbing precision.

Clinical iatrogenesis is direct harm. The governance framework creates so much overhead that teams avoid using AI for anything requiring approval. The centre of excellence becomes a bottleneck that prevents practice entirely. Beer would recognise this instantly: the purpose of the system is what it does, and what this system does is prevent the adoption it was created to enable. Beer governs the Interaction lever in the Deciding phase because he provides the architecture (the Viable System Model) that diagnoses and corrects inversion. Illich diagnoses the pathology. Beer provides the remedy. POSIWID is the diagnostic that connects them: if the programme is producing programme artefacts but not changed practice, then producing artefacts is its purpose.
Social iatrogenesis is the redefinition of normal activity as requiring institutional mediation. The developer who taught themselves AI effectively is invisible because they have no certificate. The team that built a working integration is unrecognised because they did not follow the approved methodology. Normal professional development, learning by doing, has been reclassified as insufficient.
Cultural iatrogenesis is the deepest. The programme destroys the organisation’s capacity for autonomous learning. After years of governance frameworks, approved tool lists, and mandatory training paths, people have lost the disposition to learn independently. They wait to be told. They wait for the programme. They have internalised the hidden curriculum so completely that learning without institutional mediation seems irresponsible. This is the deepest damage, and it is largely invisible, because the people who have suffered it do not know that anything has been taken from them.

Giddens’s structuration theory explains why cultural iatrogenesis is so resistant to correction. The programme is not merely a set of rules. It is a structure reproduced through daily practice: the habitual consultation of the approved tool list, the automatic referral to the centre of excellence, the reflex to check governance before acting. Bourdieu would say the programme has inscribed itself in the habitus. The dependency is no longer institutional. It is embodied.

4. From Programme to Conditions

The question every transformation leader must ask is not “How do I build a better programme?” but “How do I create the conditions from which learning emerges without a programme?”

Illich called the alternative a convivial institution: one that provides tools and access without dictating use, that increases the capacity for autonomous action rather than replacing it with managed consumption. Heifetz’s holding environment is the leadership practice that creates convivial conditions: holding the distress of not knowing, protecting the space for experimentation, giving the work back to the people who must do it. Weick’s small wins are the mechanism: concrete, visible changes that establish new practice before the old institutional logic can reassert itself. Drucker’s systematic abandonment is the discipline: regularly asking “if we were not already running this programme, would we start it now?”

The forward connection to the Deciding phase is direct. Illich diagnoses the pathology of institutional inversion: the point at which structures become counterproductive. Beer, who governs the Interaction lever in the Deciding phase, provides the cybernetic architecture that prevents or corrects the inversion. His Viable System Model ensures that each part of the organisation has the autonomy to respond to its environment while remaining coordinated with the whole. POSIWID (the Purpose Of a System Is What It Does) is Beer’s operationalisation of Illich’s diagnosis: if the system is producing programme artefacts, not learning, then the system’s purpose is programme artefacts, regardless of what the strategy says. Redesign the system, not the communication plan.

The deepest lesson Illich offers this series is that the institutional form itself, the programme, the centre of excellence, the governance framework, is not a neutral container for transformation. It is an active force that shapes what transformation can become. If the institution is convivial, it amplifies autonomous capability. If it has inverted, it replaces autonomous capability with institutional dependency. Every structure in the organisation is doing one or the other, at every moment, in every interaction. The leader’s task is to know which.

(An Organisational Prompt is something you can do now....)

Organisational Prompt

Run the iatrogenesis diagnostic on one programme, one initiative, one governance structure created to support your transformation.

Clinical: is the programme directly preventing the thing it was designed to enable? Are there teams that would adopt AI faster if the programme did not exist?

Social: has the programme redefined competence so that only programme-certified activity counts? Are there practitioners who have taught themselves effectively but are invisible because they are outside the framework?

Cultural: have people lost the disposition to learn independently? Do teams wait for permission, for the approved tool, for the centre of excellence to publish guidance, before trying something new? If you abolished the programme tomorrow, would your people know how to start?

If you find iatrogenesis at any level, the response is not to fix the programme. It is to ask Drucker’s question: “If we were not already doing this, would we start now?” And if the answer is no, to have the courage to stop, and to trust that the conditions for learning are more productive than the institution that has been consuming them.

Further Reading

Ivan Illich, Deschooling Society (1971). The foundational text. Short, radical, and immediately applicable beyond education. Freely available as a PDF.

Ivan Illich, Tools for Conviviality (1973). The more general statement: the distinction between convivial and manipulative institutions, and the criteria for assessing which a given institution has become. Freely available as a PDF.

Ivan Illich, Medical Nemesis: The Expropriation of Health (1976). The iatrogenesis argument. Demonstrates the pattern across domains: the institution replaces the activity it was designed to support with the consumption of its own services.

Paulo Freire, Pedagogy of the Oppressed (1970). The complementary critique. The “banking model” of education is the pedagogy of Illich’s manipulative institution. Problem-posing education is what convivial learning looks like in practice.