~/architect
#spec-driven-development #ai-agents #software-process #testing #architecture #compliance

Spec-Driven Development: Drift, Trade-offs, and the Roles That Must Adapt

When the spec is the source and the agent is the compiler, drift becomes the new bug class — and every classical role has to retool around it.

Spec-driven development (SDD) is the practice of treating a structured specification (not the code) as the durable artifact, and letting a coding agent generate the implementation from it. The promise is appealing: less prose-to-code translation loss, faster feature delivery, fewer regressions. The catch is subtler: the moment generated code diverges from the spec that produced it, your “source of truth” silently becomes a lie.

What SDD actually changes

In a classical workflow, the code is canonical and documentation lags behind. In SDD, that polarity flips. The spec, typically a structured markdown document with requirements, design decisions, contracts, and task breakdowns, is the upstream artifact. Tools like GitHub Spec Kit and AWS Kiro formalize three phases (specify → plan → tasks) before any code is generated, and they keep the spec alive afterwards so future changes flow through it.

// Spec-driven loop with drift checkpoints
flowchart LR
R[Requirements] --> P[Plan and contracts]
P --> T[Tasks]
T --> A((Coding agent))
A --> C[Generated code]
C --> V{Contract and golden tests}
V -- pass --> M[Merge]
V -- fail --> D[Drift report]
D --> R
M --> Re[Runtime conformance probes]
Re -. divergence .-> D

The level of commitment varies. Spec-first teams write a spec, generate once, then maintain the code. Spec-anchored teams keep both in sync. Spec-as-source teams regenerate from the spec on every change and treat the code as a build artifact. Each level pays a different drift tax.

The drift problem

Drift is the gap between what the spec says and what the implementation does. It shows up in four shapes:

  • Interpretation drift — the agent picks a reasonable but unintended path (JWT instead of session cookies) because the spec underspecified the contract.
  • Edit drift — a human patches the code directly to fix a bug and never reflects the change back into the spec.
  • Regeneration drift — the spec is edited, the agent regenerates a module, and a previously-passing behaviour quietly disappears because it lived in the code but not the spec.
  • Runtime drift — the deployed system evolves (config, schema, downstream APIs) and the spec’s assumptions stop holding.

The dangerous property of all four is that they compound silently. A spec that is 95% accurate is worse than no spec, because reviewers trust it.

Techniques to detect and contain drift

The teams that survive SDD treat the spec like production code: versioned, tested, and continuously validated against the running system.

1. Make the spec executable where possible

Express contracts in formats a machine can check: OpenAPI for HTTP boundaries, JSON Schema for payloads, Gherkin or property-based predicates for behaviour. Anything written only in prose will drift; anything compiled into a test will not.

# spec/contracts/orders.openapi.yaml
paths:
  /orders/{id}:
    get:
      responses:
        "200":
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Order"
                required: [id, status, total_cents, currency]

2. Golden tests as the spec’s anti-corruption layer

For each acceptance criterion, commit a frozen input/output pair. The agent is free to refactor the implementation between regenerations, but any change in a golden output is a drift event that must be either acknowledged (update the spec and the golden together) or rejected.

// tests/golden/checkout.spec.ts
import { runScenario } from "../harness";
import golden from "./checkout.golden.json";

test("checkout.scenario-7 matches spec/REQ-CHK-7", async () => {
  const actual = await runScenario("checkout/scenario-7");
  expect(actual).toEqual(golden); // diff is the drift report
});

3. Traceability IDs everywhere

Every requirement gets a stable ID (REQ-CHK-7). It appears in the spec, the task list, the commit message, the test name, and ideally a code comment at the entry point. A pre-merge check that finds orphan tests or orphan requirements catches drift before review.

4. Small, validated increments

Regenerating an entire module is where regeneration drift hides. Breaking work into atomic tasks (one requirement, one diff, one test) keeps each checkpoint reviewable by a human (human-in-the-loop).

5. Runtime conformance probes

Contract drift continues after merge. Schedule consumer-driven contract tests against staging and production, and emit a drift event when a response no longer matches the schema in spec/.

How the classical roles must adapt

Knowing the techniques is not enough — each one must be owned by someone. SDD does not delete roles, it redistributes their leverage and assigns each role a specific category of drift they are responsible for preventing. The techniques above are not a shared checklist: they map to individual accountabilities, and anyone who refuses to own theirs becomes the bottleneck and the source of the drift that silently degrades the system.

Product Owner: owns interpretation drift

The PO moves from writing features and user stories to authoring, and owning the lifecycle of, the structured spec. The shift in responsibility is sharper than it sounds: every ambiguous acceptance criterion is an open invitation for the agent to make a reasonable but unintended choice — JWT instead of session cookies, pagination over streaming, a nullable field instead of a required one. That is interpretation drift, and it begins in the spec before any code is generated. Making the spec executable (technique 1) is, in practice, a PO deliverable: it is the PO who decides whether an acceptance criterion stays as prose or gets promoted to an OpenAPI schema or a Gherkin scenario — prose that a human can interpret charitably will be interpreted literally by an agent.

The PO also owns the spec’s change log: when a requirement shifts, the spec changes first — before the agent runs, before the code changes. Any requirement change that bypasses the spec is the starting condition for edit drift.

Software Architect — owns regeneration drift at module boundaries

The architect becomes the guardian of the meta-spec: the constraints, patterns, and non-functional requirements that every generated module must respect (security boundaries, data ownership, latency budgets, module interfaces). Their primary drift concern is what must remain invariant across regenerations. When a module is regenerated, the implementation can change freely, but the interface contract, the security posture, performance expectations and the data ownership rules must not. Those invariants should be expressed as executable architectural fitness functions: CI checks that run against every diff and fail when a regenerated module violates a boundary, imports a disallowed dependency, or silently shifts the safety class of a component. Traceability IDs (technique 3) are the scaffolding that makes these fitness functions auditable: when every REQ-ID appears in the spec, the task list, the generated code, and the CI check, the architect can query “show me everything this requirement touched” across the full lifecycle.

Developer: owns edit drift

The developer shifts from code author to orchestrator and reviewer. Their most important drift responsibility is also their most tempting failure mode: patching the generated code directly to fix a bug, without reflecting the change back into the spec. That is edit drift, and it is the fastest way to make the spec a confident-looking lie. The discipline is to treat the spec as strictly upstream — if the generated output is wrong, the spec is wrong first, and the fix flows downward through a new generation.

The small, validated increment discipline (technique 4) is non-negotiable here: regenerating an entire module in one step is where regeneration drift hides undetected. Breaking work into atomic tasks — one requirement, one diff, one golden test — keeps each checkpoint human-reviewable. Traceability IDs (technique 3) complete the picture: a pre-merge check that surfaces orphan tests or orphan requirements before they decay into invisible drift is the developer’s last line of defence.

Tester / QA: owns runtime drift and makes all drift visible

QA arguably gains the most leverage in SDD. Their job stops being “find bugs in code a human wrote” and becomes “encode the spec as executable oracles — and keep them honest.” The golden test harness (technique 2) is QA’s anti-corruption layer: frozen input/output pairs per acceptance criterion lock what the agent decided and make interpretation and regeneration drift visible the moment an output changes. Runtime conformance probes (technique 5) extend that protection into production, emitting a drift event when a deployed response no longer matches the schema in spec/.

Exploratory testing remains, but its purpose shifts: the goal is to find spec gaps, not implementation bugs. A gap in the spec is a future interpretation drift event waiting to happen.

SDD in regulated environments (ISO 13485, IEC 62304)

Regulated environments do not introduce new drift types — they raise the cost of each one. An interpretation drift event that causes a rework sprint on a standard team might become a nonconformance report and a design change request in a medical device company. Regeneration drift that deletes a passing behaviour becomes a regression against a verified requirement, triggering re-approval evidence. Runtime drift that silently changes a response format becomes a post-market surveillance signal with patient safety implications.

The roles described above do not disappear in regulated contexts. The PO who prevents interpretation drift, the architect who guards regeneration boundaries, the developer who fights edit drift, the QA engineer who makes drift visible — they become formally accountable parties in the quality management system, with their drift responsibilities mapped to specific IEC 62304 and ISO 13485 obligations.

Regulated software — medical devices under ISO 13485 and IEC 62304, but also DO-178C avionics, IEC 61508 industrial safety, and 21 CFR Part 11 — exposes the weakest structural assumptions of SDD. The standards demand things an agent does not, by default, produce: a controlled design history, deterministic traceability from user need to verified code, justified risk controls, and an auditable record of who decided what and why.

The friction points are concrete:

  • Non-deterministic generation vs. deterministic lifecycle. IEC 62304 expects a repeatable lifecycle. An agent that produces different code for the same spec on two runs breaks that expectation unless the regeneration is itself a controlled, signed artifact.
  • Traceability gaps. ISO 13485-7.3 and IEC 62304-5.1 require a chain from user need → requirement → design → unit → test → release. Agent-generated code that lacks stable requirement IDs in every artifact will fail audit.
  • Software safety classification. IEC 62304 classes (A, B, C) drive the rigor of unit testing, integration testing, and architectural documentation. An agent that silently merges Class C logic into a Class A module changes the safety classification of the whole component.
  • AI-specific obligations. IEC 62304 Edition 2 (2026) introduces clause 5.1.15: when AI is part of the development pipeline of the product, the manufacturer must maintain an AI plan covering data, training, validation, and known limitations. “An agent wrote it” is now a documented lifecycle event, not an implementation detail.
  • Change control. Every regeneration is a change. Under section 6 of IEC 62304, that triggers impact analysis, regression evidence, and re-approval, not a silent git push.
// SDD pipeline mapped to an IEC 62304 controlled lifecycle
flowchart TB
UN[User need / clinical intent] --> SR[Software requirement REQ-ID]
SR --> RA[Risk analysis ISO 14971]
RA --> SC[Safety class A/B/C]
SC --> SP[Structured spec under QMS control]
SP --> AG((Coding agent run signed and logged))
AG --> CD[Generated code with REQ-ID anchors]
CD --> UT[Unit and integration tests]
UT --> VV[V and V evidence pack]
VV --> DHF[Design History File]
DHF --> RL[Release with audit trail]
RL -. post-market drift signal .-> RA

Practical techniques that make SDD survive an audit:

  1. Pin and log every generation (architect-owned; prevents regeneration drift from becoming an untraceable lifecycle event). Record agent model, version, prompt hash, spec commit SHA, and seed. The Design History File must be able to answer “show me exactly what produced this binary.” Treat the spec repository as a controlled document under ISO 13485 §4.2.
  2. Hard-link requirement IDs to safety class (architect + developer; prevents interpretation drift from crossing safety classification boundaries without audit evidence). Every REQ-ID carries an explicit class. A CI check rejects diffs where a Class A module imports symbols from a Class B/C requirement without an approved interface contract.
  3. Promote golden tests to verification records (QA-owned; turns the drift detector into the compliance artifact). Each golden case maps 1:1 to a verification protocol entry. The test report is the V&V evidence pack — not a screenshot or a Slack thread.
  4. Treat the agent as a tool of known limitation (architect + QA; scopes where interpretation drift is acceptable and where the agent must not operate). IEC 62304 section 5.1.5 is the natural home for the AI plan: what the agent may generate, what it must never touch (cryptographic primitives, dosing calculations), and how its output is validated.
  5. Use two-channel review for safety-critical paths (developer + independent reviewer; exists specifically to catch edit drift and interpretation drift before they enter the Design History File). For Class C code, the reviewer’s sign-off is recorded against the requirement ID. Generation and review must not share the same run.
  6. Freeze, do not regenerate, at release (all roles; the release boundary is where regeneration drift and runtime drift must stop). Future fixes regenerate from a branched spec with full change control, not from main.

The honest assessment: SDD does not magically deliver compliance, and it can actively harm it if drift control is weak. But a spec-anchored SDD pipeline — with signed generations, ID-locked traceability, and golden-test verification — maps more cleanly onto IEC 62304 than the ad-hoc “code + retrospective documentation” pattern most regulated teams actually run today. The compliance burden moves from chasing the code to maintaining the spec, which is exactly where the standards already want it to be.

Trade-offs

AdvantageDrawback
Spec-firstCheap to adopt; clear intent at kickoffSpec rots the moment humans patch the code
Spec-anchoredLiving source of truth; drift is detectableRequires discipline and tooling on every change
Spec-as-sourceRegeneration is safe; refactors are trivialHeavy upfront investment; small changes feel disproportionate
Prose-only specsFast to write; flexibleDrift is invisible until production
Executable specsDrift surfaces in CIAuthoring cost is real; not every requirement compiles
SDD in regulated domainsSpec-as-truth aligns with ISO 13485 / IEC 62304 intentDemands signed generations, ID traceability, and tool qualification

SDD is not a free lunch. It moves the cost of correctness earlier — into the spec, the contracts, and the harness — in exchange for cheaper change later. Teams that pay that cost honestly get an auditable, regenerable system. Teams that don’t end up with a confident-looking spec, a divergent implementation, and an agent happy to keep widening the gap.

← back to posts