The Brutal Stella Review: When a Governance Framework Audits Itself

I ran my own governance framework against itself. It found 10 bugs in 24 hours.

What Stella Protocol is

Stella Protocol is my open-source AI-PM methodology — a set of skill files, governance docs, and structural gates that let a PM drive an AI coding agent from raw idea to shipped product without losing the plot. It ships as an NPM package (stella-protocol) and is listed on the Claude marketplace. Named after the Vegapunk satellites in One Piece: Shaka writes PRDs, Pythagoras does architecture, ODA handles design, Lilith runs the review, Cipher Pol — the scope-drift monitor — watches for unapproved surface-area changes, and Buster Call — the quality/security veto — blocks ship on real red flags. The whole thing is PM-first: the human owns intent and decisions, the AI executes.

What you'll learn

01

Why prose rules in skill files silently drift when you're shipping fast — and how to spot it.

02

How to replace "please update this log" rules with structural output gates agents can't skip.

03

Why any framework you ship to other people has to be able to audit its own author first.

Why I dogfooded it

Every version before 0.9 shipped with me using it on real projects (Stoka, House of Riddle, Amal Najib) and tweaking between sessions. That catches usage bugs. It does not catch framework bugs — contradictions between skill files, stale version numbers in manifest files, missing enforcement on the rules the framework preaches to others.

So I ran Stella’s own stella-review skill against the Stella Protocol repo itself. Sanity check: if the framework can audit scope, decision logs, and security for other projects, it should survive auditing its own source.

The 10 findings

In 24 hours, one review pass plus two follow-ups, Stella found 10 bugs in itself:

1 CRITICAL

Cipher Pol skill file routed scope-change logs to brain/scope-changes.md, but the governance doc protocol/governance/cipher-pol.md said brain/vivre-cards.md — the append-only decision log. Two different source-of-truth files for the same audit trail.

3 HIGH

package.json declared 0.4.1 while marketplace.json still pinned 0.3.0. Users installing from the marketplace got a version older than the one advertised in release notes.
install.js crashed without a readable error when the target directory had a read-only brain/ (common if a prior install was aborted mid-write).
The init template stamped YYYY-MM-DD literally into brain/log-pose.md instead of replacing it with the real date. Every brand-new project had a placeholder as its first timestamp.

4 MEDIUM

The install script did not validate that the protocol/skills/ directory actually contained the required skill files before claiming success.
“Buster Call” was spelled three different ways across skill files, governance docs, and README: Buster Call, Buster-Call, busterCall. Agents matching by string missed triggers.
Two more in the same family.

2 LOW

Dead reference links and a stale badge.

The CRITICAL, in depth

Cipher Pol’s one job: when an agent is about to create a new route, API endpoint, or DB table not in the approved PRD, classify the drift (INTEL / ALERT / INTERCEPT) and log it.

The skill file protocol/skills/cipher-pol.md said:

On INTERCEPT, append entry to brain/scope-changes.md with:
- Timestamp, phase, proposed change, PRD gap, classification

The governance reference protocol/governance/cipher-pol.md — which I wrote two weeks later, during a refactor — said:

Scope deltas are recorded in brain/vivre-cards.md under the
"Scope Drift" section, append-only.

Both files had shipped. Both were being read by agents. On some projects, scope alerts landed in scope-changes.md. On others — same framework version, different session — they landed in vivre-cards.md. On one project (mine), they split: half the drifts in one file, half in the other, no single file giving the full audit trail.

I only caught it because Stella’s own review pass cross-referenced the two files and flagged the contradiction. If I had been auditing only the code, I would have missed it. The bug was in the prose.

The insight

Prose rules in skill files are not enforceable when the framework has one author shipping fast. I can hold the full mental model for a week. By week three, I have edited one doc and not the other, and the framework now disagrees with itself. Agents don’t resolve contradictions — they pick whichever file they read last, or average the two, or do something weirder.

Documentation drift in a normal codebase is a papercut. In a governance framework, drift is a silent failure of the thing the framework exists to prevent. The users of my framework (indie builders and PMs running AI coding agents) inherit the contradiction without knowing it exists. They get intermittent behavior and no way to diagnose it, because the contradiction is not in code they read — it’s in prompt context they never see.

The other nine findings had the same root cause, just lower blast radius. Version number mismatch in marketplace.json? Two source-of-truth files, no structural link. Spelling of “Buster Call”? Same string repeated in four files, no single constant. YYYY-MM-DD placeholder? A template read by an installer that never asked “is this a real date?” Every bug was a case where I relied on me remembering to update multiple places in sync.

The structural fix in v0.9.0

I did not fix this by proofreading harder. I added an EXIT GATE — a structural requirement baked into the skill’s output schema. Cipher Pol’s skill file now must output a block of the form:

## Scope Drift Log
- Written to: brain/scope-changes.md
- Entry appended: YES
- Timestamp: 2026-04-18T14:22:00Z

If the agent cannot produce that block with a real timestamp and a real file path, it cannot proceed. The governance doc references the same block. One source of truth, enforced at output time, not at review time.

More on why prose rules fail and EXIT GATEs work: EXIT GATE, Not Prose.

Post-fix re-review

After shipping v0.9.0, I ran stella-review against the repo again. It caught a new drift I had introduced during the fix: I had added an EXIT GATE to Cipher Pol but forgotten to add one to punk-records, which has the same file-routing problem (updates both log-pose.md and vivre-cards.md). Without the gate, a future refactor could split the routes again.

That is the sign a governance framework is working: the re-review finds the next tier of drift, not the same tier you just fixed.

Key Takeaways

Dogfood the framework on itself. If your governance rules break when applied to their own author, you haven’t written rules — you’ve written aspirations. Run them against your own source before shipping.
Prose rules drift; structural gates don’t. Any rule you care about enforcing needs to live in the output schema, not in a paragraph the agent can skim past.
A working audit finds the next tier of drift, not the same one twice. Re-run after every fix. If the same class of bug keeps surfacing, the fix was prose; if a new class surfaces, the fix was structural.

Satellite: Morgans (this post) · Lilith Red (review pass) · Pipeline: AUDIT — Stella Review → Morgans