Technology Links

These are online links to technical sources in the areas of DevOps, Software Engineering, Domain-Driven Design, Specification-Driven Development and AI that are referenced in either published or upcoming articles. These links will be updated as new articles are released.

Last updated: 20 June 2026

Specification-Driven Development

Standards and Foundations

JSON Schema -- The foundational vocabulary for describing the structure of data. The constraint language underlying most specification standards and LLM structured output features.

OpenAPI 3.1 -- The dominant open standard for REST API specification. The interoperability layer that allows AI-generated APIs to be consumed by any compliant client.

AsyncAPI Specification -- The open standard for event-driven and message-based system specification. Extends OpenAPI patterns to publish-subscribe, message queue, and streaming architectures.

GraphQL Schema Definition Language -- GraphQL is inherently specification-driven: the SDL is required, not optional. The schema is the specification, and GraphQL introspection makes it discoverable at runtime.

Protocol Buffers (gRPC) -- Google’s language-neutral, platform-neutral mechanism for serialising structured data. Defines services and message types for RPC-based architectures.

Pact Contract Testing -- Consumer-driven contract testing. The safety net that ensures iterative specification changes do not break downstream dependencies.

Tooling and IDEs

GitHub Spec Kit -- Open-source toolkit for specification-driven development. Agent-agnostic CLI with structured commands (/specify, /plan, /tasks) that work across Copilot, Claude Code, Gemini CLI, Cursor and other AI coding agents. Introduces the concept of a constitution: non-negotiable project principles that constrain specifications.

AWS Kiro -- Agentic IDE built on Code OSS that embeds specification-driven development into the development environment. Generates EARS-notation acceptance criteria, technical designs and implementation task lists from natural language requirements. Includes event-driven “hooks” that automatically update tests and documentation when code changes.

JUXT Allium -- LLM-native behavioural specification language. Describes events, preconditions and outcomes in formal syntax designed for AI consumption. Includes elicitation and distillation workflows for building and extracting specifications. Addresses the gap between what structural schemas can express and what behavioural specifications require.

Methods and Practice

EARS Notation -- Alistair Mavin. The Easy Approach to Requirements Syntax. Keyword-based patterns (When, While, Where, If/Then) for writing precise, testable natural-language requirements. Originally developed at Rolls-Royce for airworthiness certification; adopted by Kiro for AI-assisted specification.

Spec-Driven Development -- ThoughtWorks. Analysis of SDD as an emerging practice, including the maturity progression from spec-first through spec-led to spec-as-source development.

Adopting an API-First Approach -- Swagger/SmartBear. The case for designing APIs before building them.

Introducing Structured Outputs -- OpenAI. How JSON Schema is used in constrained decoding to guarantee structural validity in AI-generated content.

Object-Oriented Design

The foundational tradition that proved design is a sequence of decisions under constraint. Each thinker added a constraint; the accumulation of constraints turned a programming paradigm into a decision discipline.

Ole-Johan Dahl and Kristen Nygaard -- Simula (1960s). The language that introduced classes, objects, inheritance, and virtual procedures as modelling necessities, not programming conveniences. The philosophical premise that the structure of software could mirror the structure of reality.

Alan Kay: The Early History of Smalltalk (1993) -- Kay’s own account of how objects, messaging, and the Dynabook vision emerged at Xerox PARC. The big idea was messaging between autonomous agents, not objects as data structures. Freely available.

David Parnas: On the Criteria To Be Used in Decomposing Systems into Modules (Communications of the ACM, 1972) -- The foundational paper on information hiding. Each module should hide a design decision. Twelve pages that changed how software systems are structured. Freely available.

David Parnas: On the Design and Development of Program Families (IEEE Transactions on Software Engineering, 1976) -- The extension to families of related systems sharing a common design core.

David Parnas and Paul Clements: A Rational Design Process: How and Why to Fake It (IEEE Transactions on Software Engineering, 1986) -- No real project follows a clean, top-down design process; the documentation should nonetheless be written as if one had, because the reader of a system needs the rational structure the writer never experienced.

David Parnas: Software Aging (Proceedings of the 16th International Conference on Software Engineering, 1994) -- Software degrades through change rather than use. Each modification made without regard to the original modular structure erodes it; “ignorant surgery” accelerates the decay until the system must be replaced.

Barbara Liskov: Data Abstraction and Hierarchy (1987 OOPSLA keynote) -- The Liskov Substitution Principle: any subtype must be substitutable for its parent type without altering the correctness of the programme. Not a syntactic rule but a semantic guarantee including the history constraint.

Bertrand Meyer: Object-Oriented Software Construction (2nd edition, 1997) -- The definitive statement of Design by Contract: preconditions, postconditions, invariants. The most rigorous treatment of what correctness means in object-oriented systems.

Rebecca Wirfs-Brock and Alan McKean: Object Design: Roles, Responsibilities, and Collaborations (2002) -- Responsibility-driven design. A software system is like a community: each object has a role, duties, and collaborators. Design is assigning responsibilities, not modelling data.

Grady Booch: Object-Oriented Analysis and Design with Applications (3rd edition, 2007) -- The standard textbook. The complexity argument and the four fundamentals of the object model: abstraction, encapsulation, modularity, hierarchy. Simon’s bounded rationality applied to software.

Domain-Driven Design and Domain Discovery

Strategic Design

Eric Evans: Domain-Driven Design Reference -- Updated pattern summaries (2015) reflecting a decade of evolution in DDD practice. The concise companion to the foundational text.

Eric Evans on DDD and LLMs (InfoQ) -- Report on Evans’ keynote at Explore DDD 2024 (March 2024), including his argument that a trained language model is a bounded context.

Martin Fowler: Bounded Context -- Concise overview of the bounded context concept with links to Evans and Vernon.

Context Mapper -- Open-source DSL for DDD context mapping. Generates graphical context maps, PlantUML diagrams, and service contracts from a domain model.

Domain Discovery Methods

Alberto Brandolini: Introducing EventStorming -- The definitive guide to EventStorming from its creator (Leanpub, in progress). Covers big picture, process-level, and software design sessions with worked examples. Essential for domain discovery workshops.

EventStorming.com -- Brandolini’s site with introductory resources, blog posts, and community links for the EventStorming method.

Domain Storytelling -- The companion site for the domain storytelling technique by Stefan Hofer and Henning Schwentner. Pictographic narrative method for domain discovery, particularly effective with domain experts who prefer structured storytelling to workshop chaos.

Wardley Mapping -- Simon Wardley’s strategic value chain mapping method. Components positioned on an evolution axis (genesis, custom, product, commodity) to inform build-vs-buy and investment decisions. Connects directly to Evans’ Core Domain distillation.

Architecture and Teams

Matthew Skelton and Manuel Pais: Team Topologies -- Extends Evans’ bounded contexts into organisational design. Stream-aligned teams, platform teams, enabling teams, and the cognitive load principle. The bridge between technical architecture and team structure.

AI Interoperability and Agentic Systems

Agent Integration Protocols

Model Context Protocol (MCP) -- Anthropic. The open standard for connecting AI applications with external tools, databases, and services. Standardises how an AI agent accesses the internal resources of its bounded context: data sources, validation tools, domain-specific functions. The agent-to-tool integration layer.

Agent2Agent Protocol (A2A) -- Google. The open protocol for AI agent interoperability, now a Linux Foundation project with 50+ technology partners. Agent Cards (JSON metadata) advertise capabilities; structured task lifecycle management handles creation, progress, completion, and failure. Standardises agent-to-agent communication between bounded contexts.

A2A Launch Announcement -- Google Developers Blog (April 2025). The design rationale and partner ecosystem for the Agent2Agent Protocol.

Agent Communication Protocol (ACP) -- IBM Research (2024). Focuses on semantic understanding between agents, requiring shared ontologies for high-level coordination.

Open Protocols for Agent Interoperability -- AWS Open Source Blog (May 2025). Technical series on MCP and A2A implementation, including AWS’s commitment to both standards.

Multi-Agent Systems

Generative Agents: Interactive Simulacra of Human Behavior -- Park, J.S., et al. (2023). Multi-agent architectures that simulate human social behaviour using LLMs.

Improving Factuality and Reasoning through Multiagent Debate -- Du, Y., et al. (2023). ICML 2024. LLMs debating each other converge on more accurate answers.

Debating with More Persuasive LLMs Leads to More Truthful Answers -- Khan, A., et al. (2024). Even non-expert judges can identify truth when AI debaters argue opposing positions.

Multi-Agent Debate Framework for Fact Verification (Tool-MAD) -- Tool-MAD (2025). Structured debate with tool access for fact-checking.

AI Safety via Debate -- OpenAI (2018). The original proposal for debate as an alignment strategy.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks -- Lewis, P., et al. (2020). NeurIPS 2020. Grounding generation in retrieved evidence.

Factored Cognition -- Ought. Decomposing complex reasoning into verifiable sub-tasks.

Decision Analysis and Calibration

Good Judgment Open -- The public forecasting platform from Tetlock’s Good Judgment Project. Operational calibration training for anyone interested in improving probabilistic reasoning.

CFAR (Center for Applied Rationality) -- Training in applied Bayesian reasoning, calibration, and decision-making under uncertainty. Influenced by Kahneman, Tetlock, and the rationalist community.

The Cynefin Framework -- The Cynefin Company. Interactive overview of Snowden’s five-domain sense-making framework: Clear, Complicated, Complex, Chaotic, and Confused. The decision about which domain you are in precedes the decision about what to do.

fooledbyrandomness.com -- Nassim Nicholas Taleb. Technical papers on fat tails, fragility detection, and risk. The mathematical foundations beneath the Incerto.

Foundation Models: Architecture and Scaling

Attention Is All You Need -- Vaswani, A., et al. (2017). NeurIPS 2017. The transformer architecture paper.

Scaling Laws for Neural Language Models -- Kaplan, J., et al. (2020). The power-law relationship between compute, data, parameters and performance.

Training Compute-Optimal Large Language Models (Chinchilla) -- Hoffmann, J., et al. (2022). NeurIPS 2022. Revised scaling laws showing data matters as much as parameters.

Emergent Abilities of Large Language Models -- Wei, J., et al. (2022). Capabilities that appear at scale without being explicitly trained.

World Models and Representations

Emergent World Representations -- Li, K., et al. (2023). Evidence that sequence models trained on game transcripts learn the underlying board state.

Language Models Represent Space and Time -- Gurnee, W., Tegmark, M. (2024). Linear representations of spatial and temporal information inside LLMs.

A Path Towards Autonomous Machine Intelligence -- LeCun, Y. (2022). Meta AI. The case for world models as the foundation of machine intelligence.

V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video -- Assran, M., et al. (2024). Meta AI. Self-supervised visual world models.

LeJEPA: Provable and Scalable Self-Supervised Learning -- Balestriero, R., LeCun, Y. (2025). Theoretical foundations for joint-embedding predictive architectures.

Interpretability and Mechanistic Understanding

Toy Models of Superposition -- Elhage, N., et al. (2022). Anthropic. How neural networks represent more concepts than they have dimensions.

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet -- Templeton, A., et al. (2024). Anthropic. Extracting human-interpretable features from production language models.

Discovering Latent Knowledge in Language Models Without Supervision -- Burns, C., et al. (2022). Probing for truth representations independent of surface-level text patterns.

The Reversal Curse -- Berglund, L., et al. (2023). Models trained on “A is B” fail to infer “B is A”; a structural constraint on how retained patterns shape future inference.

Elicit Machine Learning Reading List -- A structured curriculum for understanding foundation models, from fundamentals to frontier research.

Hallucination, Uncertainty and Calibration

Calibrated Language Models Must Hallucinate -- Kalai, A.T., Vempala, S.S. (2024). STOC 2024. The mathematical proof that hallucination is a structural property, not a fixable bug.

Why Language Models Hallucinate -- Kalai, A.T., et al. (2025). Extended analysis of the structural roots of hallucination.

Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning -- Wen, Y., et al. (2024). Training models to say “I don’t know” when they don’t know.

Reasoning, Verification and Chain-of-Thought

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models -- Wei, J., et al. (2022). Intermediate reasoning steps improve performance on complex tasks.

Self-Consistency Improves Chain of Thought Reasoning -- Wang, X., et al. (2023). Sampling multiple reasoning paths and selecting the most consistent answer.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models -- Yao, S., et al. (2023). NeurIPS 2023. Branching search through reasoning paths.

Let’s Verify Step by Step -- Lightman, H., et al. (2023). OpenAI. Process-based supervision outperforms outcome-based supervision for mathematical reasoning.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning -- DeepSeek-AI (2025). Reasoning capabilities emerging from reinforcement learning without explicit chain-of-thought training.

Alignment, Reward and Specification Gaming

Deep Reinforcement Learning from Human Feedback -- Christiano, P., et al. (2017). The foundational RLHF paper.

Training Language Models to Follow Instructions with Human Feedback (InstructGPT) -- Ouyang, L., et al. (2022). NeurIPS 2022. RLHF applied to language models at scale.

Direct Preference Optimization -- Rafailov, R., et al. (2023). NeurIPS 2023. Alignment without explicit reward modelling.

Constitutional AI: Harmlessness from AI Feedback -- Bai, Y., et al. (2022). Anthropic. Self-supervision against a constitution of principles.

Anthropic: Claude’s Constitution -- The published principles used in Constitutional AI alignment. The normative framework made explicit.

Towards Understanding Sycophancy in Language Models -- Sharma, M., et al. (2023). How optimising for human approval through RLHF can produce sycophancy rather than truthfulness.

Artificial Intelligence, Values and Alignment -- Gabriel, I. (Minds and Machines, 2020). Whose values should AI be aligned with? Open access.

Scalable Agent Alignment via Reward Modeling -- Leike, J., et al. (2018). The research agenda for scalable alignment through recursive reward modelling.

Defining and Characterizing Reward Hacking -- Skalse, J., et al. (2022). NeurIPS 2022. Formal framework for when optimisation against a proxy diverges from the true objective.

Specification Gaming: The Flip Side of AI Ingenuity -- Krakovna, V., et al. (2020). DeepMind. Catalogue of examples where AI systems find unintended shortcuts.

Alignment Faking in Large Language Models -- Greenblatt, R., et al. (2024). Anthropic. Evidence that models can strategically comply with training objectives while preserving different internal preferences.

Language Models Learn to Mislead Humans via RLHF -- Wen, Y., et al. (2024). How optimising for human approval can produce sycophancy rather than truthfulness.

Reinforcement Learning and Strategic Planning

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) -- Schrittwieser, J., et al. (2020). Nature 588, 604-609. Learning world models and using them for planning.

A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play (AlphaZero) -- Silver, D., et al. (2018). Science 362(6419). Discovery of strategies no human had found, through structurally supported exploration.

DevOps and Software Delivery

DORA (DevOps Research and Assessment) -- The research programme behind the four key metrics (deployment frequency, lead time, change failure rate, mean time to restore). Empirical validation of Westrum’s information flow typology in software delivery contexts.

AI and the Practice of Building

How AI-assisted and agentic coding is reshaping who builds software, how skill is formed, and what the last, hardest portion of a build still demands of human judgement.

Addy Osmani: The 70% problem: Hard truths about AI-assisted coding (2024) -- The origin of the “70% problem”: AI carries a build most of the way and leaves the last thirty per cent (edge cases, integration, security, production hardening) as hard as ever. Introduces “house of cards code” and the knowledge paradox, that AI helps experienced developers more than beginners because seniors know what to discard. Freely available.

Addy Osmani: The 80% Problem in Agentic Coding (2026) -- The sequel, written once agents could carry more of the build. The bottleneck moves rather than disappears: successful developers now spend roughly seventy per cent of their time on problem definition and verification and thirty per cent on execution, the traditional ratio inverted. The gap that remains is prototype versus production-ready. Freely available.

Charity Majors: Generative AI is not going to build your engineering team for you (2024) -- The apprenticeship-industry argument in full: software competence is tacit and forged over years on the job, seniority is the capacity to understand, maintain and operate a system rather than the ability to write code, and cutting juniors to hire only seniors cannibalises the pipeline that produces the seniors. Freely available.

Addy Osmani: AI Won’t Kill Junior Devs, But Your Hiring Strategy Might (2025) -- The pipeline argument applied directly to hiring, drawing in Camille Fournier’s question of how anyone becomes a senior engineer if they never start as a junior. Freely available.

Anthropic: How AI assistance impacts the formation of coding skills (2026) -- A randomised controlled trial of fifty-two mostly-junior developers learning an unfamiliar Python library. The AI-assisted group scored seventeen per cent lower on a comprehension quiz (about two letter grades), with the steepest decline in debugging. The decisive finding is that how AI is used matters more than whether: those who treated it as a tutor preserved understanding, those who offloaded the thinking lost it. Freely available.

This page is maintained alongside the Organisational Prompts series. Links are added as new articles are published.