Technology Links

These are online links to technical sources in the areas of DevOps, Software Engineering, Domain-Driven Design, Specification-Driven Development and AI that are referenced in either published or upcoming articles. These links will be updated as new articles are released.


Specification-Driven Development

Standards and Foundations

JSON Schema -- The foundational vocabulary for describing the structure of data. The constraint language underlying most specification standards and LLM structured output features. Referenced in The Clarity Problem.

OpenAPI 3.1 -- The dominant open standard for REST API specification. The interoperability layer that allows AI-generated APIs to be consumed by any compliant client. Referenced in The Clarity Problem and The Domain Problem.

AsyncAPI Specification -- The open standard for event-driven and message-based system specification. Extends OpenAPI patterns to publish-subscribe, message queue, and streaming architectures. Referenced in The Clarity Problem and The Domain Problem.

GraphQL Schema Definition Language -- GraphQL is inherently specification-driven: the SDL is required, not optional. The schema is the specification, and GraphQL introspection makes it discoverable at runtime. Referenced in The Clarity Problem.

Protocol Buffers (gRPC) -- Google’s language-neutral, platform-neutral mechanism for serialising structured data. Defines services and message types for RPC-based architectures. Referenced in The Clarity Problem.

Pact Contract Testing -- Consumer-driven contract testing. The safety net that ensures iterative specification changes do not break downstream dependencies. Referenced in The Clarity Problem.

Tooling and IDEs

GitHub Spec Kit -- Open-source toolkit for specification-driven development. Agent-agnostic CLI with structured commands (/specify, /plan, /tasks) that work across Copilot, Claude Code, Gemini CLI, Cursor and other AI coding agents. Introduces the concept of a constitution: non-negotiable project principles that constrain specifications. Referenced in The Clarity Problem.

AWS Kiro -- Agentic IDE built on Code OSS that embeds specification-driven development into the development environment. Generates EARS-notation acceptance criteria, technical designs and implementation task lists from natural language requirements. Includes event-driven “hooks” that automatically update tests and documentation when code changes. Referenced in The Clarity Problem.

JUXT Allium -- LLM-native behavioural specification language. Describes events, preconditions and outcomes in formal syntax designed for AI consumption. Includes elicitation and distillation workflows for building and extracting specifications. Addresses the gap between what structural schemas can express and what behavioural specifications require. Referenced in The Clarity Problem.

Methods and Practice

EARS Notation -- Alistair Mavin. The Easy Approach to Requirements Syntax. Keyword-based patterns (When, While, Where, If/Then) for writing precise, testable natural-language requirements. Originally developed at Rolls-Royce for airworthiness certification; adopted by Kiro for AI-assisted specification. Referenced in The Clarity Problem.

Spec-Driven Development -- ThoughtWorks. Analysis of SDD as an emerging practice, including the maturity progression from spec-first through spec-led to spec-as-source development. Referenced in The Clarity Problem.

Adopting an API-First Approach -- Swagger/SmartBear. The case for designing APIs before building them. Referenced in The Clarity Problem.

Introducing Structured Outputs -- OpenAI. How JSON Schema is used in constrained decoding to guarantee structural validity in AI-generated content. Referenced in The Clarity Problem.


Domain-Driven Design and Domain Discovery

Strategic Design

Eric Evans: Domain-Driven Design Reference -- Updated pattern summaries (2015) reflecting a decade of evolution in DDD practice. The concise companion to the foundational text. Referenced in The Domain Problem.

Eric Evans on DDD and LLMs (InfoQ) -- Report on Evans’ keynote at Explore DDD 2024 (March 2024), including his argument that a trained language model is a bounded context. Referenced in The Domain Problem.

Martin Fowler: Bounded Context -- Concise overview of the bounded context concept with links to Evans and Vernon.

Context Mapper -- Open-source DSL for DDD context mapping. Generates graphical context maps, PlantUML diagrams, and service contracts from a domain model. Referenced in The Domain Problem.

Domain Discovery Methods

Alberto Brandolini: Introducing EventStorming -- The definitive guide to EventStorming from its creator (Leanpub, in progress). Covers big picture, process-level, and software design sessions with worked examples. Essential for domain discovery workshops. Referenced in The Domain Problem.

EventStorming.com -- Brandolini’s site with introductory resources, blog posts, and community links for the EventStorming method.

Domain Storytelling -- The companion site for the domain storytelling technique by Stefan Hofer and Henning Schwentner. Pictographic narrative method for domain discovery, particularly effective with domain experts who prefer structured storytelling to workshop chaos. Referenced in The Domain Problem.

Wardley Mapping -- Simon Wardley’s strategic value chain mapping method. Components positioned on an evolution axis (genesis, custom, product, commodity) to inform build-vs-buy and investment decisions. Connects directly to Evans’ Core Domain distillation. Referenced in The Domain Problem.

Architecture and Teams

Matthew Skelton and Manuel Pais: Team Topologies -- Extends Evans’ bounded contexts into organisational design. Stream-aligned teams, platform teams, enabling teams, and the cognitive load principle. The bridge between technical architecture and team structure. Referenced in The Domain Problem.


AI Interoperability and Agentic Systems

Agent Integration Protocols

Model Context Protocol (MCP) -- Anthropic. The open standard for connecting AI applications with external tools, databases, and services. Standardises how an AI agent accesses the internal resources of its bounded context: data sources, validation tools, domain-specific functions. The agent-to-tool integration layer. Referenced in The Clarity Problem and The Domain Problem.

Agent2Agent Protocol (A2A) -- Google. The open protocol for AI agent interoperability, now a Linux Foundation project with 50+ technology partners (Atlassian, Box, Salesforce, SAP, ServiceNow, and others). Agent Cards (JSON metadata) advertise capabilities; structured task lifecycle management handles creation, progress, completion, and failure. Standardises agent-to-agent communication between bounded contexts. Referenced in The Domain Problem.

A2A Launch Announcement -- Google Developers Blog (April 2025). The design rationale and partner ecosystem for the Agent2Agent Protocol.

Agent Communication Protocol (ACP) -- IBM Research (2024). Focuses on semantic understanding between agents, requiring shared ontologies for high-level coordination. Referenced in The Domain Problem.

Open Protocols for Agent Interoperability -- AWS Open Source Blog (May 2025). Technical series on MCP and A2A implementation, including AWS’s commitment to both standards.

Multi-Agent Systems

Generative Agents: Interactive Simulacra of Human Behavior -- Park, J.S., et al. (2023). Multi-agent architectures that simulate human social behaviour using LLMs.

Improving Factuality and Reasoning through Multiagent Debate -- Du, Y., et al. (2023). ICML 2024. LLMs debating each other converge on more accurate answers.

Debating with More Persuasive LLMs Leads to More Truthful Answers -- Khan, A., et al. (2024). Even non-expert judges can identify truth when AI debaters argue opposing positions.

Multi-Agent Debate Framework for Fact Verification (Tool-MAD) -- Tool-MAD (2025). Structured debate with tool access for fact-checking.

AI Safety via Debate -- OpenAI (2018). The original proposal for debate as an alignment strategy.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks -- Lewis, P., et al. (2020). NeurIPS 2020. Grounding generation in retrieved evidence.

Factored Cognition -- Ought. Decomposing complex reasoning into verifiable sub-tasks.


Foundation Models: Architecture and Scaling

Attention Is All You Need -- Vaswani, A., et al. (2017). NeurIPS 2017. The transformer architecture paper.

Scaling Laws for Neural Language Models -- Kaplan, J., et al. (2020). The power-law relationship between compute, data, parameters and performance.

Training Compute-Optimal Large Language Models (Chinchilla) -- Hoffmann, J., et al. (2022). NeurIPS 2022. Revised scaling laws showing data matters as much as parameters.

Emergent Abilities of Large Language Models -- Wei, J., et al. (2022). Capabilities that appear at scale without being explicitly trained.


World Models and Representations

Emergent World Representations -- Li, K., et al. (2023). Evidence that sequence models trained on game transcripts learn the underlying board state.

Language Models Represent Space and Time -- Gurnee, W., Tegmark, M. (2024). Linear representations of spatial and temporal information inside LLMs.

A Path Towards Autonomous Machine Intelligence -- LeCun, Y. (2022). Meta AI. The case for world models as the foundation of machine intelligence.

V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video -- Assran, M., et al. (2024). Meta AI. Self-supervised visual world models.

LeJEPA: Provable and Scalable Self-Supervised Learning -- Balestriero, R., LeCun, Y. (2025). Theoretical foundations for joint-embedding predictive architectures.


Interpretability and Mechanistic Understanding

Toy Models of Superposition -- Elhage, N., et al. (2022). Anthropic. How neural networks represent more concepts than they have dimensions.

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet -- Templeton, A., et al. (2024). Anthropic. Extracting human-interpretable features from production language models.

Discovering Latent Knowledge in Language Models Without Supervision -- Burns, C., et al. (2022). Probing for truth representations independent of surface-level text patterns.

The Reversal Curse -- Berglund, L., et al. (2023). Models trained on “A is B” fail to infer “B is A”; a structural constraint on how retained patterns shape future inference.

Elicit Machine Learning Reading List -- A structured curriculum for understanding foundation models, from fundamentals to frontier research. Sections on interpretability, uncertainty, reinforcement learning, debate, task decomposition, and tool use.


Hallucination, Uncertainty and Calibration

Calibrated Language Models Must Hallucinate -- Kalai, A.T., Vempala, S.S. (2024). STOC 2024. The mathematical proof that hallucination is a structural property, not a fixable bug.

Why Language Models Hallucinate -- Kalai, A.T., et al. (2025). Extended analysis of the structural roots of hallucination.

Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning -- Wen, Y., et al. (2024). Training models to say “I don’t know” when they don’t know.

Extending Epistemic Uncertainty Beyond Parameters -- Bálint, D., et al. (2025). Separating what the model knows from what it does not.


Reasoning, Verification and Chain-of-Thought

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models -- Wei, J., et al. (2022). Intermediate reasoning steps improve performance on complex tasks.

Self-Consistency Improves Chain of Thought Reasoning -- Wang, X., et al. (2023). Sampling multiple reasoning paths and selecting the most consistent answer.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models -- Yao, S., et al. (2023). NeurIPS 2023. Branching search through reasoning paths.

Let’s Verify Step by Step -- Lightman, H., et al. (2023). OpenAI. Process-based supervision outperforms outcome-based supervision for mathematical reasoning.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning -- DeepSeek-AI (2025). Reasoning capabilities emerging from reinforcement learning without explicit chain-of-thought training.


Alignment, Reward and Specification Gaming

Deep Reinforcement Learning from Human Feedback -- Christiano, P., et al. (2017). The foundational RLHF paper.

Training Language Models to Follow Instructions with Human Feedback (InstructGPT) -- Ouyang, L., et al. (2022). NeurIPS 2022. RLHF applied to language models at scale.

Direct Preference Optimization -- Rafailov, R., et al. (2023). NeurIPS 2023. Alignment without explicit reward modelling.

Constitutional AI: Harmlessness from AI Feedback -- Bai, Y., et al. (2022). Anthropic. Self-supervision against a constitution of principles.

Scalable Agent Alignment via Reward Modeling -- Leike, J., et al. (2018). The research agenda for scalable alignment through recursive reward modelling.

Defining and Characterizing Reward Hacking -- Skalse, J., et al. (2022). NeurIPS 2022. Formal framework for when optimisation against a proxy diverges from the true objective.

Specification Gaming: The Flip Side of AI Ingenuity -- Krakovna, V., et al. (2020). DeepMind. Catalogue of examples where AI systems find unintended shortcuts.

Specification Gaming Examples Database -- Krakovna, V., et al. Comprehensive spreadsheet of specification gaming instances across reinforcement learning.

Alignment Faking in Large Language Models -- Greenblatt, R., et al. (2024). Anthropic. Evidence that models can strategically comply with training objectives while preserving different internal preferences.

Language Models Learn to Mislead Humans via RLHF -- Wen, Y., et al. (2024). How optimising for human approval can produce sycophancy rather than truthfulness.


Reinforcement Learning and Strategic Planning

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) -- Schrittwieser, J., et al. (2020). Nature 588, 604-609. Learning world models and using them for planning.

A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play (AlphaZero) -- Silver, D., et al. (2018). Science 362(6419). Discovery of strategies no human had found, through structurally supported exploration.


DevOps and Software Delivery

DORA (DevOps Research and Assessment) -- The research programme behind the four key metrics (deployment frequency, lead time, change failure rate, mean time to restore). Empirical validation of Westrum’s information flow typology in software delivery contexts. Referenced in The Culture Problem.


Last updated: February 2026

This page is maintained alongside the Organisational Prompts series. Links are added as new articles are published.