Course / Lesson 15  ·  PT-BR
Lesson 15 · Engine & method · 2 of 8

The funnel: a 4-tier, $0-floor ETL

The funnel is how Alembic turns a corpus of raw sources into validated business signals and learnings — at a cost that starts at exactly $0 and only climbs as candidates prove worth paying for. It is a four-tier cascade: T0 deterministic (free) → T1 local (~free) → T2 metered frontier shortlist → T3 council + verifier panel. Only a verified-GO emits. This is the data engine the learning loop feeds from. Source: packages/harness/src/funnel.ts.

The cascade — narrow the spend, not the corpus

The cheapest tier touches 100% of the corpus; each subsequent tier is more expensive but sees only the survivors of the one before. The cost curve bends the right way:

T0 · deterministic walk → SHA-256 dedupe → contract-validate → 6-dim score → residue $0 T1 · one BusinessSignal per residue item · LOCAL adapter (free tier) ~$0 T2 · budget-gated FRONTIER shortlist refines the strongest T1 metered T3 · council + N-lens panel metered only a verified-GO (GO ∧ panel-approved) survives to the bottom and emits
TierWhat it doesCost
T0runT0Pipeline: walk → SHA-256 dedupe → contract-validate → 6-dim score → emit residue, over 100% of corpus (excluding Repos/Models + Repos/Prompts)$0
T1runT1Extraction: one BusinessSignal per residue item via the injected LOCAL adapter; free-tier, so never budget-blocked in practice~$0
T2runT2Shortlist: a budget-gated FRONTIER shortlist refines the strongest T1 signals in batches; every paid call meteredmetered
T3runT3Council: a synthetic 3-member council (optimist/analyst/pessimist, meets MIN_VALID_AGENTS=3) + the N-lens verifier panelmetered

The verified-GO signal — two locks, not one

A T3 outcome only emits when both the consensus decision is GO and the N-lens panel approved emission (verified, not parked). A bare GO is not enough:

// packages/harness/src/funnel.ts:496-514 (condensed) — the emission gate
const verified =
  consensus.decision === 'GO' &&
  isPanelEmissionApproved(report);     // N-lens panel: verified, NOT parked
if (verified) {
  // emit opportunity edges + learnings; surface on verifiedSignals (PII-safe)
}

The verified-GO BusinessSignal[] are surfaced on FunnelReport.verifiedSignals — the PII-safe bridge into the marketing factory (distillAndMarket). Lesson 18 covers the panel; the point here is that the funnel demands consensus and independent verification before it spends downstream effort or emits anything outward.

Three safety invariants the funnel must never regress

① PII before egress

A signal from a PRIVATE channel (whatsapp, discord, skool, circle) is redacted before the model call (extractionInput redacts) and gated again by assertRedactedForEmit before any write (emitSafeSignal). An unredacted private-channel signal is dropped, never emitted. FunnelReport.t1PiiBlocked is a non-zero alarm. Governed by ADR-0011.

② Budget fail-closed

Every paid (T2/T3) call is wrapped in a fail-closed BudgetGuard.check before dispatch; a projected breach blocks the call and the tier degrades rather than overspending. Pricing always uses the registry tier rate (pricingModelId), never a catalog override — so an overridden gateway model not in the registry is still metered against the cap by its tier rate. You cannot accidentally route around the budget.

③ Append-only

Results flow to the two stores via append-only, content-addressed, schema-validated writes; source reads stay read-only. The two outputs are the BUSINESS opportunity graph (Business/opportunity-graph.jsonl) and the LEARNINGS store (Skills/learning/learnings.jsonl) — the two value-chains of ADR-0002.

Why the funnel lives in the L4 harness, not in etl. It orchestrates L1 adapters + L2 council + L0 etl. Placing it in etl would force etl to depend upward on adapters/council, inverting the layer graph. So the deterministic T0 stays in @alembic/etl (pure, contracts-only) and the orchestrator that calls T0→T3 lives in @alembic/harness. The layering is preserved by where the code sits, not by convention.
1. A corpus has 10,000 items. Roughly how many reach the paid T2 tier?
Correct: b. T0 (free) touches 100%; T1 (free-tier local) extracts a signal per residue item; T2 (metered) refines only the strongest T1 signals in batches. The funnel narrows the spend, paying frontier rates only for candidates that already cleared two cheaper filters.
2. The consensus is GO but the N-lens panel parked one lens. Does the funnel emit the signal?
Correct: d. The verified-GO is a conjunction of two independent locks: consensus GO and an approved (verified, not parked) verifier panel. Either lock failing means nothing is emitted to verifiedSignals.
3. A WhatsApp-sourced signal somehow reaches the emit step still unredacted. What happens?
Correct: a. PII is redacted before the model call and re-checked before any write. An unredacted private-channel signal never gets written; the counter flags it. Fail-closed, not crash — invariant 1 of the funnel.

Common confusions

"Offline means degraded results." For T0/T1 it means hermetic, deterministic, and $0 — the local tiers are the design's floor, not a fallback. alembic distill <corpus> --offline runs the whole pipeline with an offline adapter registry and never touches a paid API.
"The budget cap can be bypassed with a model override." No — pricing always resolves through the registry tier rate (pricingModelId), so even a catalog-overridden gateway model is metered by its tier. The budget guard sees the projected cost regardless of which concrete model name was pinned.