Course / Lesson 27  ·  PT-BR
Lesson 27 · Advanced · cost control

Tiers, cost & the fail-closed budget guard

An engine that fans one request to many models needs two orthogonal dials: autonomy (how much human oversight a piece of work requires) and cost (how much money a model call may spend). Alembic encodes the first as the Tier ladder T0→T4, the second as a model registry priced per 1k tokens plus a fail-closed BudgetGuard. This lesson connects them: how a tier routes to the cheapest qualifying model, how every paid call is metered, and why a non-positive budget means "free tier only" — never "unlimited."

The autonomy ladder: T0 → T4 (and the LOCAL marker)

The Tier enum is exactly five rungs, cheapest-autonomy-first. They are about human oversight, not money:

TierMeaningAutonomous?
T0silent / fully autonomous, no human in the loopyes (the silent path)
T1autonomous with lightweight loggingyes
T2autonomous, single reviewer notifiedyes
T3autonomous, council review requiredyes
T4PARK — withheld from autonomous execution; needs council + humanno

Two facts make this fail-closed: DEFAULT_TIER = T4 (unclassified work parks, Lesson 26) and isAutonomous returns true only for T1–T3 (tier.ts:59). Separately, LOCAL is not a sixth tier — it's an orthogonal marker tagging work that must stay on local/$0 models "regardless of tier" (privacy- or cost-sensitive paths, tier.ts:36). A unit can be T2 and LOCAL.

The model registry: cost per 1k tokens

Each routable model is a registry entry with an adapterId, a tier, and two prices — costPer1kInputUsd and costPer1kOutputUsd. There are 11 entries spread across the tiers (the map records this; representative values shown):

// packages/contracts/src/registry.ts — shape + representative rows
{ modelId: 'local-default',  adapterId: 'local',       tier: T0, in: 0,      out: 0 }       // $0 hermetic
{ modelId: 'local-extract',  adapterId: 'local',       tier: T1, in: 0,      out: 0 }
{ modelId: 'qwen3.7-plus',    adapterId: 'cliproxyapi', tier: T1, in: 0.00015,out: 0.0005 }
{ modelId: 'glm-5.2',         adapterId: 'cliproxyapi', tier: T2, in: 0.0002, out: 0.0006 }  // DEFAULT_MODEL_ID
{ modelId: 'gpt-5.5-xhigh',   adapterId: 'cliproxyapi', tier: T3, in: 0.015,  out: 0.045 }
{ modelId: 'claude-opus-4-8-max', adapterId: 'cliproxyapi', tier: T3, in: 0.03, out: 0.15 }

The split across tiers (per the complete-map §6): T0 = local-default, local-extract ($0); T1 = kimi-k2.7-code-highspeed, grok-composer-2.5-fast, qwen3.7-plus; T2 = deepseek-v4-pro, gemini-3.5-flash, glm-5.2, qwen3.7-max; T3 = gpt-5.5-xhigh, claude-opus-4-8-max. The local adapter keeps T0 at $0 for hermetic CI. (Model ids/prices are illustrative registry data and evolve; the shape is the contract.)

Routing: pick the cheapest qualifying model

When a tier is chosen but no specific modelId is pinned, pickCheapestForTier selects the cheapest entry of that tier by combined per-1k cost. It's pure over the registry — a deterministic fold, no side effects:

// packages/contracts/src/registry.ts:145-156
export const pickCheapestForTier = (tier, registry = MODEL_REGISTRY) => {
  const candidates = Object.values(registry).filter((e) => e.tier === tier);
  if (candidates.length === 0) return undefined;
  return candidates.reduce((cheapest, entry) => {
    const entryCost = entry.costPer1kInputUsd + entry.costPer1kOutputUsd;   // in + out
    const bestCost = cheapest.costPer1kInputUsd + cheapest.costPer1kOutputUsd;
    return entryCost < bestCost ? entry : cheapest;
  });
};
Why combined input+output? A model cheap on input but expensive on output could lose on a generation-heavy task. Summing both gives a single comparable scalar. ADR-0006 layers a depth floor on top: the router should pick the cheapest model above a quality floor, reserving frontier models (the T3 pair) for hard adjudication — so "cheapest" never means "too weak."

The budget guard: fail-closed metering

Routing decides which model; the BudgetGuard decides whether the call may happen at all. It's created with a hard USD cap and three methods — and its defaults are deliberately paranoid:

// packages/etl/src/budget.ts:120-152 (condensed)
export const createBudgetGuard = (capUsd) => {
  const cap = Math.max(0, capUsd);   // a non-positive cap clamps to 0 = free-tier only
  let spent = 0;
  return {
    check(estimate) {
      const projectedUsd = priceEstimate(estimate);
      if (projectedUsd === 0) return { ok: true, projectedUsd: 0, … };   // free call always passes
      if (roundUsd(spent + projectedUsd) > cap)
        return { ok: false, reason: 'budget_exceeded', … };       // would overshoot ⇒ BLOCK
      return { ok: true, projectedUsd, remainingUsd: remaining() };
    },
    record(spend) { spent = roundUsd(spent + costOf(spend)); return spent; },
  };
};
The fail-closed move: cap = Math.max(0, capUsd)

A non-positive cap doesn't mean "no limit" — it clamps to 0, which means free tier only: any call with projectedUsd > 0 is blocked, because 0 + anything > 0. So the safe default (no budget set) is the cheapest possible posture, never a runaway spend. Free-tier calls (projectedUsd === 0) always pass, which is why the whole funnel can run $0 and hermetic on the LOCAL adapter (Lesson 15).

Pricing is always applied at metering

The map states a subtle invariant: pricing is always applied. Look at record's helper costOf — even when a ModelRunResult carries a costUsd, a free-tier model is metered as 0 (isFreeTierModel(spend.modelId), budget.ts:110), and a paid model falls back to costForModel(modelId, usage) if no explicit cost is present. Spend is never guessed and never skipped: T2/T3 calls are metered to the registry price, so the running spent total is authoritative.

pickCheapestForTierin+out, ≥ floor guard.check(est)fits cap? run() (waist)only if ok guard.record()meter to price check fails ⇒ budget_exceeded, call never runs
1. You create createBudgetGuard(0) (or a negative cap). What can run?
Correct: c. cap = Math.max(0, capUsd) clamps to 0, and check passes a call only if projectedUsd === 0 or it fits under the cap. With cap 0, only free calls (LOCAL/T0) pass — fail-closed: the default posture spends nothing.
2. Two T2 models cost (in+out) $0.0006 and $0.0009 per 1k. With no pinned modelId, which does pickCheapestForTier(T2) return, and is the choice deterministic?
Correct: b. The reduce keeps the entry with the smaller costPer1kInputUsd + costPer1kOutputUsd. It's pure over MODEL_REGISTRY (no Date/random), so the same registry always yields the same pick — which matters for replay (Lesson 28).
3. A ModelRunResult from a free-tier model carries a stray costUsd: 0.01. How does record meter it?
Correct: d. costOf (budget.ts:109-111) returns 0 for a free-tier model before reading costUsd. Pricing is always applied from the model's tier/registry, so a stray field can't inflate the spend total — metering is authoritative, not advisory.

Common confusions

"LOCAL is the sixth tier." No — Tier is exactly T0–T4. LOCAL is an orthogonal marker that pins work to $0 models regardless of its tier. A unit can be both T2 and LOCAL: autonomous-with-a-reviewer, but kept on local models for privacy or cost.
"Tier sets the cost." They're related but separate. Tier is autonomy/oversight; cost is per-model pricing. A T3 council step uses pricier models, yes — but the budget guard, not the tier, is what stops a paid call from exceeding the cap. Two dials, two mechanisms.