Technology Links
These are online links to technical sources in the areas of DevOps, Software Engineering, Domain-Driven Design, Specification-Driven Development and AI that are referenced in either published or upcoming articles. These links will be updated as new articles are released.
Last updated: 5 April 2026
Specification-Driven Development
Standards and Foundations
JSON Schema -- The foundational vocabulary for describing the structure of data. The constraint language underlying most specification standards and LLM structured output features.
OpenAPI 3.1 -- The dominant open standard for REST API specification. The interoperability layer that allows AI-generated APIs to be consumed by any compliant client.
AsyncAPI Specification -- The open standard for event-driven and message-based system specification. Extends OpenAPI patterns to publish-subscribe, message queue, and streaming architectures.
GraphQL Schema Definition Language -- GraphQL is inherently specification-driven: the SDL is required, not optional. The schema is the specification, and GraphQL introspection makes it discoverable at runtime.
Protocol Buffers (gRPC) -- Google’s language-neutral, platform-neutral mechanism for serialising structured data. Defines services and message types for RPC-based architectures.
Pact Contract Testing -- Consumer-driven contract testing. The safety net that ensures iterative specification changes do not break downstream dependencies.
Tooling and IDEs
GitHub Spec Kit -- Open-source toolkit for specification-driven development. Agent-agnostic CLI with structured commands (/specify, /plan, /tasks) that work across Copilot, Claude Code, Gemini CLI, Cursor and other AI coding agents. Introduces the concept of a constitution: non-negotiable project principles that constrain specifications.
AWS Kiro -- Agentic IDE built on Code OSS that embeds specification-driven development into the development environment. Generates EARS-notation acceptance criteria, technical designs and implementation task lists from natural language requirements. Includes event-driven “hooks” that automatically update tests and documentation when code changes.
JUXT Allium -- LLM-native behavioural specification language. Describes events, preconditions and outcomes in formal syntax designed for AI consumption. Includes elicitation and distillation workflows for building and extracting specifications. Addresses the gap between what structural schemas can express and what behavioural specifications require.
Methods and Practice
EARS Notation -- Alistair Mavin. The Easy Approach to Requirements Syntax. Keyword-based patterns (When, While, Where, If/Then) for writing precise, testable natural-language requirements. Originally developed at Rolls-Royce for airworthiness certification; adopted by Kiro for AI-assisted specification.
Spec-Driven Development -- ThoughtWorks. Analysis of SDD as an emerging practice, including the maturity progression from spec-first through spec-led to spec-as-source development.
Adopting an API-First Approach -- Swagger/SmartBear. The case for designing APIs before building them.
Introducing Structured Outputs -- OpenAI. How JSON Schema is used in constrained decoding to guarantee structural validity in AI-generated content.
Object-Oriented Design
The foundational tradition that proved design is a sequence of decisions under constraint. Each thinker added a constraint; the accumulation of constraints turned a programming paradigm into a decision discipline.
Ole-Johan Dahl and Kristen Nygaard -- Simula (1960s). The language that introduced classes, objects, inheritance, and virtual procedures as modelling necessities, not programming conveniences. The philosophical premise that the structure of software could mirror the structure of reality.
Alan Kay: The Early History of Smalltalk (1993) -- Kay’s own account of how objects, messaging, and the Dynabook vision emerged at Xerox PARC. The big idea was messaging between autonomous agents, not objects as data structures. Freely available.
David Parnas: On the Criteria To Be Used in Decomposing Systems into Modules (Communications of the ACM, 1972) -- The foundational paper on information hiding. Each module should hide a design decision. Twelve pages that changed how software systems are structured. Freely available.
David Parnas: On the Design and Development of Program Families (IEEE Transactions on Software Engineering, 1976) -- The extension to families of related systems sharing a common design core.
Barbara Liskov: Data Abstraction and Hierarchy (1987 OOPSLA keynote) -- The Liskov Substitution Principle: any subtype must be substitutable for its parent type without altering the correctness of the programme. Not a syntactic rule but a semantic guarantee including the history constraint.
Bertrand Meyer: Object-Oriented Software Construction (2nd edition, 1997) -- The definitive statement of Design by Contract: preconditions, postconditions, invariants. The most rigorous treatment of what correctness means in object-oriented systems.
Rebecca Wirfs-Brock and Alan McKean: Object Design: Roles, Responsibilities, and Collaborations (2002) -- Responsibility-driven design. A software system is like a community: each object has a role, duties, and collaborators. Design is assigning responsibilities, not modelling data.
Grady Booch: Object-Oriented Analysis and Design with Applications (3rd edition, 2007) -- The standard textbook. The complexity argument and the four fundamentals of the object model: abstraction, encapsulation, modularity, hierarchy. Simon’s bounded rationality applied to software.
Domain-Driven Design and Domain Discovery
Strategic Design
Eric Evans: Domain-Driven Design Reference -- Updated pattern summaries (2015) reflecting a decade of evolution in DDD practice. The concise companion to the foundational text.
Eric Evans on DDD and LLMs (InfoQ) -- Report on Evans’ keynote at Explore DDD 2024 (March 2024), including his argument that a trained language model is a bounded context.
Martin Fowler: Bounded Context -- Concise overview of the bounded context concept with links to Evans and Vernon.
Context Mapper -- Open-source DSL for DDD context mapping. Generates graphical context maps, PlantUML diagrams, and service contracts from a domain model.
Domain Discovery Methods
Alberto Brandolini: Introducing EventStorming -- The definitive guide to EventStorming from its creator (Leanpub, in progress). Covers big picture, process-level, and software design sessions with worked examples. Essential for domain discovery workshops.
EventStorming.com -- Brandolini’s site with introductory resources, blog posts, and community links for the EventStorming method.
Domain Storytelling -- The companion site for the domain storytelling technique by Stefan Hofer and Henning Schwentner. Pictographic narrative method for domain discovery, particularly effective with domain experts who prefer structured storytelling to workshop chaos.
Wardley Mapping -- Simon Wardley’s strategic value chain mapping method. Components positioned on an evolution axis (genesis, custom, product, commodity) to inform build-vs-buy and investment decisions. Connects directly to Evans’ Core Domain distillation.
Architecture and Teams
Matthew Skelton and Manuel Pais: Team Topologies -- Extends Evans’ bounded contexts into organisational design. Stream-aligned teams, platform teams, enabling teams, and the cognitive load principle. The bridge between technical architecture and team structure.
AI Interoperability and Agentic Systems
Agent Integration Protocols
Model Context Protocol (MCP) -- Anthropic. The open standard for connecting AI applications with external tools, databases, and services. Standardises how an AI agent accesses the internal resources of its bounded context: data sources, validation tools, domain-specific functions. The agent-to-tool integration layer.
Agent2Agent Protocol (A2A) -- Google. The open protocol for AI agent interoperability, now a Linux Foundation project with 50+ technology partners. Agent Cards (JSON metadata) advertise capabilities; structured task lifecycle management handles creation, progress, completion, and failure. Standardises agent-to-agent communication between bounded contexts.
A2A Launch Announcement -- Google Developers Blog (April 2025). The design rationale and partner ecosystem for the Agent2Agent Protocol.
Agent Communication Protocol (ACP) -- IBM Research (2024). Focuses on semantic understanding between agents, requiring shared ontologies for high-level coordination.
Open Protocols for Agent Interoperability -- AWS Open Source Blog (May 2025). Technical series on MCP and A2A implementation, including AWS’s commitment to both standards.
Multi-Agent Systems
Generative Agents: Interactive Simulacra of Human Behavior -- Park, J.S., et al. (2023). Multi-agent architectures that simulate human social behaviour using LLMs.
Improving Factuality and Reasoning through Multiagent Debate -- Du, Y., et al. (2023). ICML 2024. LLMs debating each other converge on more accurate answers.
Debating with More Persuasive LLMs Leads to More Truthful Answers -- Khan, A., et al. (2024). Even non-expert judges can identify truth when AI debaters argue opposing positions.
Multi-Agent Debate Framework for Fact Verification (Tool-MAD) -- Tool-MAD (2025). Structured debate with tool access for fact-checking.
AI Safety via Debate -- OpenAI (2018). The original proposal for debate as an alignment strategy.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks -- Lewis, P., et al. (2020). NeurIPS 2020. Grounding generation in retrieved evidence.
Factored Cognition -- Ought. Decomposing complex reasoning into verifiable sub-tasks.
Decision Analysis and Calibration
Good Judgment Open -- The public forecasting platform from Tetlock’s Good Judgment Project. Operational calibration training for anyone interested in improving probabilistic reasoning.
CFAR (Center for Applied Rationality) -- Training in applied Bayesian reasoning, calibration, and decision-making under uncertainty. Influenced by Kahneman, Tetlock, and the rationalist community.
The Cynefin Framework -- The Cynefin Company. Interactive overview of Snowden’s five-domain sense-making framework: Clear, Complicated, Complex, Chaotic, and Confused. The decision about which domain you are in precedes the decision about what to do.
fooledbyrandomness.com -- Nassim Nicholas Taleb. Technical papers on fat tails, fragility detection, and risk. The mathematical foundations beneath the Incerto.
Foundation Models: Architecture and Scaling
Attention Is All You Need -- Vaswani, A., et al. (2017). NeurIPS 2017. The transformer architecture paper.
Scaling Laws for Neural Language Models -- Kaplan, J., et al. (2020). The power-law relationship between compute, data, parameters and performance.
Training Compute-Optimal Large Language Models (Chinchilla) -- Hoffmann, J., et al. (2022). NeurIPS 2022. Revised scaling laws showing data matters as much as parameters.
Emergent Abilities of Large Language Models -- Wei, J., et al. (2022). Capabilities that appear at scale without being explicitly trained.
Inference-Time Scaling and Reasoning
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters -- Snell, C., et al. (2024). ICLR 2025. The foundational result: a smaller model that “thinks longer” can outperform a model 14x its size. Test-time compute scaling as the new paradigm.
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning -- DeepSeek-AI (2025). Reasoning capabilities emerging from pure RL without explicit chain-of-thought training. The “aha moment” finding: reflective capacity emerging from reward structure alone.
The Art of Scaling Test-Time Compute for Large Language Models -- Agarwal, A., et al. (2025). First large-scale systematic comparison of test-time scaling strategies across eight LLMs and four reasoning datasets.
World Models and Representations
Emergent World Representations -- Li, K., et al. (2023). Evidence that sequence models trained on game transcripts learn the underlying board state.
Language Models Represent Space and Time -- Gurnee, W., Tegmark, M. (2024). Linear representations of spatial and temporal information inside LLMs.
A Path Towards Autonomous Machine Intelligence -- LeCun, Y. (2022). Meta AI. The case for world models as the foundation of machine intelligence.
V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video -- Assran, M., et al. (2024). Meta AI. Self-supervised visual world models.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning -- Assran, M., et al. (2025). Meta AI. 1.2B-parameter world model trained on 1M+ hours of video; achieves zero-shot robot planning in novel environments with only 62 hours of robot data.
LeJEPA: Provable and Scalable Self-Supervised Learning -- Balestriero, R., LeCun, Y. (2025). Theoretical foundations for joint-embedding predictive architectures.
Interpretability and Mechanistic Understanding
Toy Models of Superposition -- Elhage, N., et al. (2022). Anthropic. How neural networks represent more concepts than they have dimensions.
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet -- Templeton, A., et al. (2024). Anthropic. Extracting human-interpretable features from production language models.
Circuit Tracing: Revealing Computational Graphs in Language Models -- Anthropic (2025). The method for tracing the complete computational path from prompt to response. Introduces attribution graphs.
On the Biology of a Large Language Model -- Anthropic (2025). Application of circuit tracing to Claude 3.5 Haiku across ten behaviours. Reveals planning in poetry generation, unfaithful chains of thought, multilingual “language of thought,” and hallucination mechanisms.
Open-Sourcing Circuit Tracing Tools -- Anthropic (May 2025). Open-source library for generating attribution graphs on open-weights models. GitHub
Emergent Introspective Awareness in Large Language Models -- Anthropic (October 2025). Evidence that models possess a limited but genuine capacity to monitor and report on their own internal states. Most capable models (Opus 4 and 4.1) performed best.
Discovering Latent Knowledge in Language Models Without Supervision -- Burns, C., et al. (2022). Probing for truth representations independent of surface-level text patterns.
The Reversal Curse -- Berglund, L., et al. (2023). Models trained on “A is B” fail to infer “B is A”; a structural constraint on how retained patterns shape future inference.
Elicit Machine Learning Reading List -- A structured curriculum for understanding foundation models, from fundamentals to frontier research.
Hallucination, Uncertainty and Calibration
Calibrated Language Models Must Hallucinate -- Kalai, A.T., Vempala, S.S. (2024). STOC 2024. The mathematical proof that hallucination is a structural property, not a fixable bug.
Why Language Models Hallucinate -- Kalai, A.T., et al. (2025). Extended analysis of the structural roots of hallucination.
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning -- Wen, Y., et al. (2024). Training models to say “I don’t know” when they don’t know.
Reasoning, Verification and Chain-of-Thought
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models -- Wei, J., et al. (2022). Intermediate reasoning steps improve performance on complex tasks.
Self-Consistency Improves Chain of Thought Reasoning -- Wang, X., et al. (2023). Sampling multiple reasoning paths and selecting the most consistent answer.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models -- Yao, S., et al. (2023). NeurIPS 2023. Branching search through reasoning paths.
Let’s Verify Step by Step -- Lightman, H., et al. (2023). OpenAI. Process-based supervision outperforms outcome-based supervision for mathematical reasoning. The PRM800K dataset: 800,000 human-labelled assessments of individual reasoning steps.
Alignment, Reward and Specification Gaming
Deep Reinforcement Learning from Human Feedback -- Christiano, P., et al. (2017). The foundational RLHF paper.
Training Language Models to Follow Instructions with Human Feedback (InstructGPT) -- Ouyang, L., et al. (2022). NeurIPS 2022. RLHF applied to language models at scale.
Direct Preference Optimization -- Rafailov, R., et al. (2023). NeurIPS 2023. Alignment without explicit reward modelling.
Constitutional AI: Harmlessness from AI Feedback -- Bai, Y., et al. (2022). Anthropic. Self-supervision against a constitution of principles.
Anthropic: Claude’s Constitution -- The published principles used in Constitutional AI alignment. The normative framework made explicit.
Towards Understanding Sycophancy in Language Models -- Sharma, M., et al. (2023). How optimising for human approval through RLHF can produce sycophancy rather than truthfulness.
Artificial Intelligence, Values and Alignment -- Gabriel, I. (Minds and Machines, 2020). Whose values should AI be aligned with? Open access.
Scalable Agent Alignment via Reward Modeling -- Leike, J., et al. (2018). The research agenda for scalable alignment through recursive reward modelling.
Defining and Characterizing Reward Hacking -- Skalse, J., et al. (2022). NeurIPS 2022. Formal framework for when optimisation against a proxy diverges from the true objective.
Specification Gaming: The Flip Side of AI Ingenuity -- Krakovna, V., et al. (2020). DeepMind. Catalogue of examples where AI systems find unintended shortcuts.
Alignment Faking in Large Language Models -- Greenblatt, R., et al. (2024). Anthropic. Evidence that models can strategically comply with training objectives while preserving different internal preferences.
Language Models Learn to Mislead Humans via RLHF -- Wen, Y., et al. (2024). How optimising for human approval can produce sycophancy rather than truthfulness.
Reinforcement Learning and Strategic Planning
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) -- Schrittwieser, J., et al. (2020). Nature 588, 604-609. Learning world models and using them for planning.
A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play (AlphaZero) -- Silver, D., et al. (2018). Science 362(6419). Discovery of strategies no human had found, through structurally supported exploration.
DevOps and Software Delivery
DORA (DevOps Research and Assessment) -- The research programme behind the four key metrics (deployment frequency, lead time, change failure rate, mean time to restore). Empirical validation of Westrum’s information flow typology in software delivery contexts.
This page is maintained alongside the Organisational Prompts series. Links are added as new articles are published.
