The Brutal Stella Review: When a Governance Framework Audits Itself
I ran my own AI-PM framework against its own codebase. In 24 hours it found 10 bugs, including one CRITICAL that had been silently fragmenting every audit trail. Here's what dogfooding actually surfaces.
The short version
I pointed Stella Protocol's own review skill at the Stella Protocol repo. It found 10 bugs in 24 hours — one CRITICAL, three HIGH. The CRITICAL was a silent contradiction between two governance files that had been fragmenting every audit trail for weeks. I fixed it with a structural output gate, not more prose.
I ran my own governance framework against itself. It found 10 bugs in 24 hours.
What Stella Protocol is
Stella Protocol is my open-source AI-PM methodology — a set of skill files, governance docs, and structural gates that let a PM drive an AI coding agent from raw idea to shipped product without losing the plot. It ships as an NPM package (stella-protocol) and is listed on the Claude marketplace. Named after the Vegapunk satellites in One Piece: Shaka writes PRDs, Pythagoras does architecture, ODA handles design, Lilith runs the review, Cipher Pol — the scope-drift monitor — watches for unapproved surface-area changes, and Buster Call — the quality/security veto — blocks ship on real red flags. The whole thing is PM-first: the human owns intent and decisions, the AI executes.
Why I dogfooded it
Every version before 0.9 shipped with me using it on real projects (Stoka, House of Riddle, Amal Najib) and tweaking between sessions. That catches usage bugs. It does not catch framework bugs — contradictions between skill files, stale version numbers in manifest files, missing enforcement on the rules the framework preaches to others.
So I ran Stella’s own stella-review skill against the Stella Protocol repo itself. Sanity check: if the framework can audit scope, decision logs, and security for other projects, it should survive auditing its own source.
The 10 findings
In 24 hours, one review pass plus two follow-ups, Stella found 10 bugs in itself:
1 CRITICAL
- Cipher Pol skill file routed scope-change logs to
brain/scope-changes.md, but the governance docprotocol/governance/cipher-pol.mdsaidbrain/vivre-cards.md— the append-only decision log. Two different source-of-truth files for the same audit trail.
3 HIGH
package.jsondeclared0.4.1whilemarketplace.jsonstill pinned0.3.0. Users installing from the marketplace got a version older than the one advertised in release notes.install.jscrashed without a readable error when the target directory had a read-onlybrain/(common if a prior install was aborted mid-write).- The init template stamped
YYYY-MM-DDliterally intobrain/log-pose.mdinstead of replacing it with the real date. Every brand-new project had a placeholder as its first timestamp.
4 MEDIUM
- The install script did not validate that the
protocol/skills/directory actually contained the required skill files before claiming success. - “Buster Call” was spelled three different ways across skill files, governance docs, and README:
Buster Call,Buster-Call,busterCall. Agents matching by string missed triggers. - Two more in the same family.
2 LOW
- Dead reference links and a stale badge.
The CRITICAL, in depth
Cipher Pol’s one job: when an agent is about to create a new route, API endpoint, or DB table not in the approved PRD, classify the drift (INTEL / ALERT / INTERCEPT) and log it.
The skill file protocol/skills/cipher-pol.md said:
On INTERCEPT, append entry to brain/scope-changes.md with:
- Timestamp, phase, proposed change, PRD gap, classification
The governance reference protocol/governance/cipher-pol.md — which I wrote two weeks later, during a refactor — said:
Scope deltas are recorded in brain/vivre-cards.md under the
"Scope Drift" section, append-only.
Both files had shipped. Both were being read by agents. On some projects, scope alerts landed in scope-changes.md. On others — same framework version, different session — they landed in vivre-cards.md. On one project (mine), they split: half the drifts in one file, half in the other, no single file giving the full audit trail.
I only caught it because Stella’s own review pass cross-referenced the two files and flagged the contradiction. If I had been auditing only the code, I would have missed it. The bug was in the prose.
The insight
Prose rules in skill files are not enforceable when the framework has one author shipping fast. I can hold the full mental model for a week. By week three, I have edited one doc and not the other, and the framework now disagrees with itself. Agents don’t resolve contradictions — they pick whichever file they read last, or average the two, or do something weirder.
Documentation drift in a normal codebase is a papercut. In a governance framework, drift is a silent failure of the thing the framework exists to prevent. The users of my framework (indie builders and PMs running AI coding agents) inherit the contradiction without knowing it exists. They get intermittent behavior and no way to diagnose it, because the contradiction is not in code they read — it’s in prompt context they never see.
The other nine findings had the same root cause, just lower blast radius. Version number mismatch in marketplace.json? Two source-of-truth files, no structural link. Spelling of “Buster Call”? Same string repeated in four files, no single constant. YYYY-MM-DD placeholder? A template read by an installer that never asked “is this a real date?” Every bug was a case where I relied on me remembering to update multiple places in sync.
The structural fix in v0.9.0
I did not fix this by proofreading harder. I added an EXIT GATE — a structural requirement baked into the skill’s output schema. Cipher Pol’s skill file now must output a block of the form:
## Scope Drift Log
- Written to: brain/scope-changes.md
- Entry appended: YES
- Timestamp: 2026-04-18T14:22:00Z
If the agent cannot produce that block with a real timestamp and a real file path, it cannot proceed. The governance doc references the same block. One source of truth, enforced at output time, not at review time.
More on why prose rules fail and EXIT GATEs work: EXIT GATE, Not Prose.
Post-fix re-review
After shipping v0.9.0, I ran stella-review against the repo again. It caught a new drift I had introduced during the fix: I had added an EXIT GATE to Cipher Pol but forgotten to add one to punk-records, which has the same file-routing problem (updates both log-pose.md and vivre-cards.md). Without the gate, a future refactor could split the routes again.
That is the sign a governance framework is working: the re-review finds the next tier of drift, not the same tier you just fixed.
Key Takeaways
- Dogfood the framework on itself. If your governance rules break when applied to their own author, you haven’t written rules — you’ve written aspirations. Run them against your own source before shipping.
- Prose rules drift; structural gates don’t. Any rule you care about enforcing needs to live in the output schema, not in a paragraph the agent can skim past.
- A working audit finds the next tier of drift, not the same one twice. Re-run after every fix. If the same class of bug keeps surfacing, the fix was prose; if a new class surfaces, the fix was structural.
Satellite: Morgans (this post) · Lilith Red (review pass) · Pipeline: AUDIT — Stella Review → Morgans