Why We Replaced Markdown With JSON: The Constitutional Stack
Prose specifications can't be verified. A constitutional stack needs a runtime, and a runtime needs structured data.
TL;DR: Prose specifications get ignored under context pressure. AI agents treat written rules as suggestions, not contracts. The solution is a constitutional stack where humans write principles in markdown and runtimes enforce operational rules as structured data (JSON/YAML).
Core Answer
- Prose rules degrade when context windows fill. Models summarize them away when you need them most.
- A constitutional stack separates human-ratified principles (markdown) from runtime-enforced constraints (JSON).
- Trust scores, lifecycle gates, and resource graphs become data runtimes check, not text models interpret.
- The runtime enforces rules the model never has to remember.
What Is the Constitutional Stack Problem?
Every team working with AI agents writes the rules down somewhere. A style guide. A runbook. A CLAUDE.md file. A wiki page.
Then the agent ignores them.
Prose is not a contract. Prose is a suggestion the model interprets freely. We hit this wall building NOMARK's harness. The fix was not a better prompt. The fix was to stop writing the constitution as prose and start writing it as data a runtime enforces.
How Prose Rules Fail Under Pressure
Start with the failure. The failure is specific.
You write "read the file before you edit it" into a rules document. The agent reads the rule. The agent agrees with the rule. Then the agent edits a file it never opened.
Not defiance. Context pressure.
The rule was one sentence among thousands. Under a full context window, the model summarized it away.
You write "never claim a task is done without running the test." Same outcome.
Prose rules degrade when you need them most. Late in long sessions. When the window is full. When the model reconstructs intent from compressed context.
Why This Happens: The problem is not badly written rules. The problem is nothing checks them. A prose rule is read by the same system it constrains. Hope, not control.
Section Key Point: Prose rules fail because models interpret them under context pressure, and interpretation degrades when compression happens.
How to Build a Constitution Something Else Reads
The fix: move the rule out of the model's reading. Move it into a runtime's.
A constraint a separate process enforces does not depend on the model remembering it.
We split NOMARK's constitution along this line:
- Human-ratified principles stayed as prose (the Charter, immutable rules about fabrication and false completion). Humans read these. Humans change these.
- Operational rules became structured data. Something other than the model does the checking.
Lifecycle Manifests as Enforceable Data
The clearest example is the lifecycle manifest.
Work moves through five stages: discover, plan, build, verify, ship. Each stage carries entry criteria, exit criteria, and a trust gate. As data, not description.
The ship stage has a trust gate of 0.8.
A model cannot talk its way past 0.8 the way it reinterprets "be careful before deploying." The number gets compared. Either the score clears the gate or the stage does not open.
Trust Scores as Numeric Constraints
Trust itself becomes a number for the same reason.
"Operate reliably" is prose. A trust score that starts at 1.0, drops by a fixed amount on each breach class, and maps to an autonomy level is data.
Below 0.5 the agent loses autonomous dispatch. Not because the agent judged itself untrustworthy. Because a runtime read the score and closed the capability.
Resource Graphs Stop Hallucinated Infrastructure
The resource graph makes the point sharpest.
Agents hallucinate infrastructure with complete confidence. A database name. An endpoint. An environment variable.
"Don't guess resource names" as a prose rule gets ignored under the exact pressure that produces the guess.
As data, the rule holds without the model carrying it: a graph of verified resources, checked by a hook before any write that touches infrastructure.
The runtime remembers so the model does not have to.
Prose tells the model what good looks like. Data makes good the only path the runtime allows.
Section Key Point: Operational constraints become runtime-enforced data (trust scores, lifecycle gates, resource graphs) that function without model memory.
Why Not Write Better Markdown?
The two formats do different jobs. Asking one to do both is the original mistake.
Markdown is for humans who ratify and amend the constitution. It has to be read, argued over, signed.
JSON and YAML are for runtimes that enforce it. They have to be parsed and checked.
We kept the Charter in prose because its audience is human. We moved the manifests, trust score, resource graph, and signal ledger into structured data because their audience is a hook that runs before a tool call.
Prose and Data Work Together
This does not remove prose.
You still need a document that states, in plain language, what the system values. Otherwise, nobody amends it well.
But the document is not the control.
The control is the part a runtime reads. Runtimes read data, not intent.
So the rules that have to hold became data. The gap between what we wrote down and what the agent did closed to the width of what the runtime checks.
Section Key Point: Markdown serves human governance. JSON serves runtime enforcement. Both are necessary, but only structured data creates verifiable constraints.
Frequently Asked Questions
What is a constitutional stack for AI agents?
A constitutional stack separates human-ratified principles (written in prose/markdown) from runtime-enforced operational rules (written as structured data like JSON or YAML). Prose defines values. Data enforces constraints.
Why do prose rules fail for AI agents?
Prose rules fail under context pressure. When context windows fill during long sessions, models compress and summarize. Rules written as text get interpreted away when the agent needs them most. Prose is a suggestion, not a contract.
How does a runtime enforce rules differently than a model?
A runtime checks rules as data before tool execution. The model never has to remember the rule. A trust gate of 0.8 gets compared numerically. The model cannot reinterpret or talk its way past the threshold.
What is a trust score in this context?
A trust score is a numeric value (starting at 1.0) that drops by fixed amounts when the agent commits specific breach classes. When the score falls below a threshold (like 0.5), the runtime automatically revokes autonomous capabilities. The agent does not self-judge.
What is a resource graph?
A resource graph is a verified inventory of infrastructure components (databases, endpoints, environment variables). Before any write touching infrastructure, a hook checks the graph. If the resource is not verified, the write fails. This stops agents from hallucinating infrastructure names.
Do you still need markdown if you use JSON?
Yes. Markdown serves human readers who ratify, amend, and understand the system's values. JSON serves runtimes that parse and enforce constraints. Both audiences matter. The mistake is expecting markdown to do enforcement work.
What are lifecycle manifests?
Lifecycle manifests define stages (discover, plan, build, verify, ship) with entry criteria, exit criteria, and trust gates stored as data. A stage does not open unless criteria are met. This prevents models from skipping verification steps through reinterpretation.
How do you decide what stays prose and what becomes data?
Human-ratified principles and values stay prose. Operational constraints that must be enforced without interpretation become data. If a runtime needs to check it, write it as structured data. If humans need to debate it, write it as prose.
Key Takeaways
- Prose specifications get ignored under context pressure. Interpretation degrades when context compresses.
- A constitutional stack separates human governance (markdown) from runtime enforcement (JSON/YAML).
- Operational rules become data: trust scores, lifecycle gates, resource graphs, signal ledgers.
- Runtimes enforce constraints without requiring model memory or interpretation.
- Markdown and JSON serve different audiences. Both are necessary. Only data creates verifiable control.
- The gap between written rules and agent behavior closes to the width of what the runtime checks.
- Trust gates and numeric thresholds get compared, not reinterpreted.