$0-floor ETLThe funnel is how Alembic turns a corpus of raw sources into validated business signals and learnings — at a cost that starts at exactly $0 and only climbs as candidates prove worth paying for. It is a four-tier cascade: T0 deterministic (free) → T1 local (~free) → T2 metered frontier shortlist → T3 council + verifier panel. Only a verified-GO emits. This is the data engine the learning loop feeds from. Source: packages/harness/src/funnel.ts.
The cheapest tier touches 100% of the corpus; each subsequent tier is more expensive but sees only the survivors of the one before. The cost curve bends the right way:
| Tier | What it does | Cost |
|---|---|---|
| T0 | runT0Pipeline: walk → SHA-256 dedupe → contract-validate → 6-dim score → emit residue, over 100% of corpus (excluding Repos/Models + Repos/Prompts) | $0 |
| T1 | runT1Extraction: one BusinessSignal per residue item via the injected LOCAL adapter; free-tier, so never budget-blocked in practice | ~$0 |
| T2 | runT2Shortlist: a budget-gated FRONTIER shortlist refines the strongest T1 signals in batches; every paid call metered | metered |
| T3 | runT3Council: a synthetic 3-member council (optimist/analyst/pessimist, meets MIN_VALID_AGENTS=3) + the N-lens verifier panel | metered |
A T3 outcome only emits when both the consensus decision is GO and the N-lens panel approved emission (verified, not parked). A bare GO is not enough:
// packages/harness/src/funnel.ts:496-514 (condensed) — the emission gate const verified = consensus.decision === 'GO' && isPanelEmissionApproved(report); // N-lens panel: verified, NOT parked if (verified) { // emit opportunity edges + learnings; surface on verifiedSignals (PII-safe) }
The verified-GO BusinessSignal[] are surfaced on FunnelReport.verifiedSignals — the PII-safe bridge into the marketing factory (distillAndMarket). Lesson 18 covers the panel; the point here is that the funnel demands consensus and independent verification before it spends downstream effort or emits anything outward.
A signal from a PRIVATE channel (whatsapp, discord, skool, circle) is redacted before the model call (extractionInput redacts) and gated again by assertRedactedForEmit before any write (emitSafeSignal). An unredacted private-channel signal is dropped, never emitted. FunnelReport.t1PiiBlocked is a non-zero alarm. Governed by ADR-0011.
Every paid (T2/T3) call is wrapped in a fail-closed BudgetGuard.check before dispatch; a projected breach blocks the call and the tier degrades rather than overspending. Pricing always uses the registry tier rate (pricingModelId), never a catalog override — so an overridden gateway model not in the registry is still metered against the cap by its tier rate. You cannot accidentally route around the budget.
Results flow to the two stores via append-only, content-addressed, schema-validated writes; source reads stay read-only. The two outputs are the BUSINESS opportunity graph (Business/opportunity-graph.jsonl) and the LEARNINGS store (Skills/learning/learnings.jsonl) — the two value-chains of ADR-0002.
etl would force etl to depend upward on adapters/council, inverting the layer graph. So the deterministic T0 stays in @alembic/etl (pure, contracts-only) and the orchestrator that calls T0→T3 lives in @alembic/harness. The layering is preserved by where the code sits, not by convention.GO but the N-lens panel parked one lens. Does the funnel emit the signal?verifiedSignals.$0 — the local tiers are the design's floor, not a fallback. alembic distill <corpus> --offline runs the whole pipeline with an offline adapter registry and never touches a paid API.pricingModelId), so even a catalog-overridden gateway model is metered by its tier. The budget guard sees the projected cost regardless of which concrete model name was pinned.