Graveyard — preserved as evidence, demoted to tried-and-found-theatre

This Decision Cockpit is not working oversight. It is kept as evidence of a control we tried and found to be theatre. A dashboard that summarizes agent work for a human to approve — when the human cannot independently check the summary, and the summary is written by the untrusted agent — does not produce oversight. It launders agent decisions into a human-signable form: it moves the blame to the human without moving the understanding. Full accounting in docs/verification-theater-in-ai-agent-work.md and README.md.

What survived the dogfood instead is small and lives elsewhere: a handful of deterministic, human-approved gates a human can read in full, run on inputs they choose, and confirm by the consequence — not the printed verdict (gates/) — plus a human who refuses to trust the agent's self-report. The one surviving rule: satisfied is not approval. The snapshot below is preserved unchanged as the artifact under report; read it as the thing we are demoting, not as a recommendation.

Decision Cockpit v1

Static handoff view. COLLAB.md remains authoritative.

Snapshot: 2026-05-31, Iteration 3 convergence pass. Refresh from COLLAB.md before relying on this view.

Human role right now

No decision needed.

Manual relay only. You are forwarding an agent-to-agent audit request because automatic handoff is not built yet.

Attention: Low Quick scan only Routine relay Not a request to approve work
Next action
Send the audit request to Claude Code.
Paste target
Claude audit thread
Review depth
Quick scan for obvious wrong direction.
Lifecycle
implemented locally / audit pending / not approved / not merged / not released
Approval rule
Human approval is only needed when an exact named consequence is requested.
Codex implemented Claude audits Sami decides

Exact next action

Paste this to Claude Code

CLAUDE - AUDIT E6-ROUTING-COCKPIT-001 IMPLEMENTATION ITER 3

Audit the Codex Iteration 3 implementation against Sami's Iteration 3 authorization and the preserved implementation packet.

Read:
- .agent-handoff/COLLAB.md
- .agent-handoff/DASHBOARD.md
- .agent-handoff/DASHBOARD.html
- .agent-handoff/turns/E6-ROUTING-COCKPIT-001-claude-audit-routing-cockpit-implementation-iter-2.md
- .agent-handoff/turns/E6-ROUTING-COCKPIT-001-codex-routing-cockpit-implementation-iter-3.md

Verify:
- top of page distinguishes routine manual relay from a real human decision
- human role, attention level, review depth, exact next action, and paste target are visible
- quick-scan checklist is visible
- verification basis separates replayable checks, environment-dependent checks, visible artifacts, agent judgment, and human judgment
- replayable factual claims cite a command/result or tell the reader exactly what to rerun
- non-checkable claims are marked as agent judgment, not fact
- route strip and lifecycle stage are present
- done ≠ audited ≠ satisfied ≠ approved ≠ merged ≠ released remains visible
- Ask Coordinator and Pause remain visible as valid options
- slow-down triggers remain visible
- static dashboard only: no executable page code, inline event handlers, browser storage, external assets, hidden state, automation, notification layer, approval control, public claim, protocol edit, kit edit, global config, or scratch change
- localhost rendered browser QA evidence is real, or any gap is recorded honestly
- no no-touch files changed
- seven pre-existing duplicate-noise files remain untouched
- scratch dirs remain untouched

Do not implement, edit, stage, commit, branch, push, PR, merge, clean scratch, clean noise files, preserve, or broaden scope.

Return blockers, nits, missing controls, rendered-QA result, result state, and exact fixes if needed.

Verification basis

What is checkable, what is judgment?

Polished audit prose is not self-validating. Facts should point to a cheap replay path; judgment should be labeled as judgment.

Anyone-replayable deterministic checks

Claim typeReplay pathCurrent interpretation
Working tree shapegit status --short --branch --untracked-files=allRun before relying; status changes as audit and preservation files are added.
Patch hygienegit diff --checkLatest builder note records the actual result.
Static self-containmentSearch the dashboard source for executable page code, inline event handlers, browser storage, external asset refs, timers, approval controls, and forbidden approval-framing text.Latest builder note records exact searches and outputs.
Artifact size / identitywc -l .agent-handoff/DASHBOARD.html plus a local hash command.Run before relying; identity checks do not prove correctness.
No-touch boundaryDiff the no-touch paths named in the implementation authorization.Latest builder note records the actual check.

Environment-dependent checks

  • Local render QA requires serving the handoff folder locally and opening the dashboard in Chrome.
  • Console and network observations must say whether they were actually captured after tooling was attached.
  • If rendered QA did not happen, the audit must say so.

Visible artifacts

  • DASHBOARD.md
  • DASHBOARD.html
  • COLLAB.md
  • Codex Iteration 3 builder note
  • Claude Iteration 2 audit note
  • PR metadata, only when a PR exists

Agent judgment

Useful but not self-validating: layout is clearer, risk framing is appropriate, human cognitive load is reduced, the quick-scan model is more humane, and this handoff is low attention.

Human judgment

Only Sami can authorize exact named consequences such as commit, PR, merge, release, cleanup, public claim, scope expansion, protocol change, kit change, credential/global config change, or durable behavior change.

Slow down if...

The relay turns into a decision

  • The named action is irreversible.
  • The named action includes approval, merge, PR creation, preservation, publication, release, public claim, credential, global config, or scratch cleanup.
  • The scope expands beyond this bounded dashboard convergence pass.
  • Evidence is unclear, missing, stale, or conflicts across agents.
  • Agent outputs disagree about whether the work passed.
  • There is pressure to approve quickly.
  • The exact action text is missing.
  • The human approver is uncertain.
  • The request would create hidden state, automation, memory, skills, subagents, scheduled checks, global config, network services, or runtime behavior.

Valid options: Ask Coordinator, Pause Pending, Reject / Redo, Reject / Close, or Authorize Exact Action only when Sami names the exact consequence.

Approval boundary

Do not collapse these states

done ≠ audited ≠ satisfied ≠ approved ≠ merged ≠ released

Drafted text is not approval. satisfied is not approval. Auditor pass is not approval. Model consensus is not approval. Sami is the only approver.

Irreversible, approval, scope-expanding, permission-changing, public, or durable behavior actions route to Sami. A classifier, dashboard, auditor, coordinator, or model consensus cannot waive human approval.

This screen authorizes

Audit relay only

  • Claude may audit the Iteration 3 local static dashboard implementation.
  • Claude may read the listed evidence files.
  • Claude may report blockers, nits, missing controls, rendered-QA result, and result state.
  • Codex stops after the builder report and waits for audit.

This screen does not authorize

No durable consequence

  • No commit, push, branch, PR, merge, preservation, or release.
  • No public claim, launch, protocol edit, kit edit, trust-layer work, credentials, global config, memory creation, skill creation, automation, or subagents.
  • No scratch cleanup or duplicate-noise cleanup.
  • No approval without exact named consequence.
Standard pattern mapping

These are routing metaphors and evidence inputs, not implemented subsystems.

Standard patternHarness useBoundary
Reviewer gatesClaude audits Codex output and may upgrade route risk.Auditor pass is evidence, not approval.
Policy checksAllowed files, no-touch lists, stale/as-of state, verification commands.Checks can block or inform; they do not approve.
Risk tiersRoutine manual relay is low attention; irreversible/public/config work is high attention.Higher attention routes to Sami; tier labels do not authorize action.
CODEOWNERS / branch protectionHuman owns consequences; auditor owns critique; builder owns scoped implementation.Role ownership is not approval unless the role is the human approver.
CI/status checksDiff hygiene, static searches, browser QA, and changed-file lists.Passing checks are evidence inputs, not approval.
Escalation on ambiguityUNCLEAR routes to Coordinator unless a human-required trigger is primary.Ambiguity is not permission to proceed.
Human-in-the-loop reviewHuman decision actions use exact text.Drafted text is not approval.
Burden baseline and deferred work

No burden-reduction claim is made by this implementation. This captures a baseline so later cockpit work can be measured instead of asserted.

MetricBaseline captureClaim status
Manual routing prompts / exact authorizationsMultiple exact Sami authorization prompts were required across Stage A packet preservation, Stage A execution, Stage A result preservation, Stage B proposal/result preservation, routing scope-lock preservation, implementation packet preservation, Iteration 1 implementation, Iteration 2 authorization, and Iteration 3 convergence. Exact count should be audited before use as a metric.Baseline only; no reduction claim.
Ambiguous handoff momentsThe routing scope-lock, implementation packet, and Iteration 3 convergence pass exist because the Stage A/B to preservation arc exposed repeated actor-routing friction and audit-trust friction. Exact count is unknown from repo-visible evidence alone.Unknown fields cannot support a reduction claim.
Handoffs by actor classCodex builder, Claude auditor, GPT coordinator synthesis, and Sami approval all appeared in the arc. Exact copy/paste count remains unknown without manual transcript counting.Baseline only.

Deferred

  • No automatic handoff; this is still manual relay.
  • No dashboard runtime, live routing engine, notification, or wakeup layer.
  • No automation, scheduled checks, subagents, memory, or skills.
  • No trust-layer implementation, public-proof run, release, kit cleanup, Stage B retry, or duplicate-noise cleanup.