Two Lines of Defence Against Drift in SDD-Built Distributed Systems

Drift in a monolith is annoying. Drift in a distributed system is dangerous, because it multiplies: each service boundary is an opportunity for a spec to diverge from reality, and those divergences compose. Add SDD to the picture — where generated code is the downstream artifact of a spec, not the source — and you now have a pipeline where a single ambiguous requirement can corrupt multiple services simultaneously, silently, with no warning until something fails in production.

The good news is that two techniques, applied at different points in the lifecycle, cover most of the damage surface. Neither is exotic. Both are underused.

The gap that golden tests fill

A golden test is a frozen input/output pair committed alongside the code. The test passes when the current output exactly matches the frozen one. Any divergence — changed field name, missing key, reordered array — is a diff, and that diff is the drift report.

This is different from a regular assertion. A regular test says “this field should be non-null.” A golden test says “this field should be exactly 'pending', this array should have exactly three items in this order, and this response should be byte-for-byte identical to what the spec said it would be when we froze this case.”

The reason this matters in SDD is regeneration. When the agent re-generates a module — because a requirement changed, a dependency was upgraded, or the architect decided to restructure a boundary — the new implementation might pass every unit test while quietly dropping a behaviour that lived in the code but not in the spec. Golden tests catch that. They are the spec’s anti-corruption layer at the output level.

// tests/golden/checkout.spec.ts
import { runScenario } from "../harness";
import golden from "./checkout.golden.json";

test("checkout.scenario-7 matches spec/REQ-CHK-7", async () => {
  const actual = await runScenario("checkout/scenario-7");
  expect(actual).toEqual(golden); // the diff is the drift report
});

The naming convention is load-bearing: the test name embeds the requirement ID. That connects the golden file, the test run, and the spec entry into a single traceable chain. When the test fails, you don’t have to hunt for the requirement — the name tells you exactly where to look.

A golden update is not a free operation. Updating the frozen file should require the same review as updating the spec: you are changing what the system is allowed to do, not just how it does it. Teams that auto-accept golden diffs in CI have turned their anti-corruption layer into wallpaper.

Where golden tests stop working

Golden tests run against code. They catch drift that originates in the code itself — regeneration errors, refactoring side effects, agent misinterpretations. They do not run against a live environment, so they cannot catch what happens after merge: a DBA migrates a column type, a downstream API quietly renames a field, an ops engineer toggles a feature flag that changes the response shape, a config change introduces a new nullable field.

That category of drift — infrastructure drift, config drift, upstream API drift — is what runtime conformance probes exist to catch.

// Where each technique sits in the drift lifecycle

flowchart LR
S[Spec] --> A((Coding agent))
A --> C[Generated code]
C --> G{Golden tests}
G -- pass --> M[Merge]
G -- diff --> DR[Drift report]
M --> D[Deployed system]
D --> P{Runtime probes}
P -- pass --> OK[Conformant]
P -- divergence --> DR
DR --> S

Runtime conformance probes

A probe is not a test in the CI sense. It is a scheduled, continuous surveillance process that interrogates a live environment — staging, canary, or production — and validates what it gets back against the schema in spec/. It runs independently of deployments. It fires when the world changes, not just when the code does.

The simplest form: hit an endpoint on a schedule, validate the response against your OpenAPI spec, emit a drift event if validation fails.

# probes/orders_probe.py
import requests, jsonschema, yaml

spec = yaml.safe_load(open("spec/contracts/orders.openapi.yaml"))
schema = spec["components"]["schemas"]["Order"]

response = requests.get(
    "https://staging.example.com/orders/12345",
    headers={"Authorization": f"Bearer {token}"}
)

try:
    jsonschema.validate(response.json(), schema)
except jsonschema.ValidationError as e:
    emit_drift_event(requirement="REQ-ORD-3", field=list(e.path), detail=str(e.message))

The spec says total_cents is integer, required. A DBA migrates the column to NUMERIC(12,2). The ORM serializes it as 10.50. No code changed. All tests pass. The probe fires.

Consumer-driven contracts are the most structured form of this technique. Each consumer — frontend, mobile, a downstream service — publishes a contract describing what it expects from a provider. The provider runs those contracts against a live instance on a schedule, not just at deploy time. Pact is the usual tool. The important discipline is that consumer contracts live in spec/ or alongside it, version-controlled, not in a separate system that drifts independently.

Event schema probes are the same idea applied to message queues. Subscribe to a Kafka topic, validate each message against the schema from spec/events/, emit a drift event on failure. This catches the case where a developer adds a field to an event without updating the spec — or removes one without realising a consumer depends on it.

// probes/order-events-probe.ts
import Ajv from "ajv";
import schema from "../spec/events/order-created.schema.json";

const validate = new Ajv().compile(schema);

consumer.on("message", (msg) => {
  const payload = JSON.parse(msg.value.toString());
  if (!validate(payload)) {
    emitDriftEvent({
      event: "OrderCreated",
      errors: validate.errors,
      offset: msg.offset,
    });
  }
});

Behavioral probes go beyond schema. If the spec says “a cancelled order must never transition back to pending,” a probe can enforce that by creating an order, cancelling it, attempting to reactivate it, and asserting a 422. A feature flag enabling a grace-period reactivation flow gets caught here, even if the schema is still valid.

Database schema probes close the gap at the storage layer. A nightly comparison of the live database schema against a spec/schema/ snapshot will catch the dropped NOT NULL constraint, the quietly widened column type, the new nullable field that no application code checks for yet.

The key property all probes share: they produce an actionable signal, not a silent failure. A drift event should be routable to the person responsible for that requirement — which is why traceability IDs matter. A probe that emits "REQ-ORD-3 violated" is actionable. A probe that emits "JSON validation error" is noise.

Connecting the two techniques

Golden tests and runtime probes are not redundant. They catch different drift classes at different lifecycle stages. A golden test failure means the code diverged from the spec during generation or refactoring — the fix flows back through the spec. A probe failure means the live environment diverged from the spec after merge — the fix might be a schema update, a config rollback, or a conversation with a DBA.

The shared mechanism is the requirement ID. When both a golden test and a probe reference REQ-ORD-3, a single failing ID surfaces the drift regardless of whether it originated in code or infrastructure. The spec is the single ledger; both techniques write their failures against it.

Trade-offs

	Advantage	Drawback
Golden tests	Catches regeneration and refactoring drift at CI time	Requires discipline on updates — auto-accepting diffs defeats the purpose
Runtime probes (REST/schema)	Catches infrastructure and config drift that no code test can see	Needs real environments; flaky if staging data is unstable
Consumer-driven contracts (Pact)	Consumer expectations are explicit and versioned	Overhead of maintaining contracts per consumer; organizational buy-in required
Event schema probes	Validates async boundaries that HTTP tests miss	Requires a live broker; replay logic adds complexity
Behavioral probes	Enforces business invariants, not just schema shape	Slow to write; depend on test data setup in the live environment
No runtime probes	Zero infrastructure cost	Spec drift after merge is invisible until a user reports it

The honest position: golden tests alone give you a false sense of security in a distributed system. The spec can be perfectly encoded in frozen outputs and still be wrong by Tuesday, because the world the spec describes is not static. Runtime probes are the mechanism that acknowledges that fact and does something about it.