How to paste prompts, compare N independent plans, score with rubric R1–R10, and require all 16 delivery sections in PLAN.md.
Goal: N comparable plans from N clean agent sessions. Same prompt pair, same rubric, same 16 sections — then aggregate.
PROMPT-BLIND-PLANNING.md (or <prompt> block) + entire PROMPT-CORPUS-EMBEDS.md.outputs/alembic-blind-plan-<agent-id>/PLAN.md + reverse/ artifacts.Self-score 0–2 per criterion at end of each plan. Source: PROMPT-BLIND-PLANNING.md § Fase 4.
| ID | Criterion | 0 | 2 |
|---|---|---|---|
| R1 | Three REs with binary + disk + web evidence | Missing RE | All three forensically proven |
| R2 | Unified synthesis | Contradictions | Clean fusion table |
| R3 | Loop Engineering operationalized | Cited only | Gates wired in runtime |
| R4 | @alembic/harness spine reuse | Greenfield | Explicit package map |
| R5 | Operator interface | Backend only | CLI + TUI + web mockups |
| R6 | alembic.plan.ts spec | Sketch | Full hooks API |
| R7 | ~/.alembic/ layout | Vague | Complete tree |
| R8 | API protocol | Partial | REST/SSE/MCP/CLI |
| R9 | Slices A0–A9 | No proof gates | Measurable per slice |
| R10 | Comparable structure | Ad hoc | All 16 sections below |
Every PLAN.md must contain exactly these sections (markdown headings):
outputs/alembic-blind-plan-codex-gpt55/PLAN.md
outputs/alembic-blind-plan-codex-gpt55/reverse/
# Replace agent-id: codex-gpt55, kimi-k2, grok-4, ...
bash docs/alembic/build-corpus-embeds.sh
# After new brightdata scrapes → corpus-cache/*.md
Working language: reasoning + deliverables in PT-BR; code, commands, identifiers in English (per blind prompt).
Primary sources: docs/alembic/PROMPT-BLIND-PLANNING.md · docs/alembic/PROMPT-CORPUS-EMBEDS.md · loop-engineering/SKILL.md