Lesson 27 · Advanced · cost control

Tiers, cost & the fail-closed budget guard

An engine that fans one request to many models needs two orthogonal dials: autonomy (how much human oversight a piece of work requires) and cost (how much money a model call may spend). Alembic encodes the first as the Tier ladder T0→T4, the second as a model registry priced per 1k tokens plus a fail-closed BudgetGuard. This lesson connects them: how a tier routes to the cheapest qualifying model, how every paid call is metered, and why a non-positive budget means "free tier only" — never "unlimited."

The autonomy ladder: T0 → T4 (and the LOCAL marker)

The Tier enum is exactly five rungs, cheapest-autonomy-first. They are about human oversight, not money:

Tier	Meaning	Autonomous?
T0	silent / fully autonomous, no human in the loop	yes (the silent path)
T1	autonomous with lightweight logging	yes
T2	autonomous, single reviewer notified	yes
T3	autonomous, council review required	yes
T4	PARK — withheld from autonomous execution; needs council + human	no

Two facts make this fail-closed: DEFAULT_TIER = T4 (unclassified work parks, Lesson 26) and isAutonomous returns true only for T1–T3 (tier.ts:59). Separately, LOCAL is not a sixth tier — it's an orthogonal marker tagging work that must stay on local/$0 models "regardless of tier" (privacy- or cost-sensitive paths, tier.ts:36). A unit can be T2 and LOCAL.

The model registry: cost per 1k tokens

Each routable model is a registry entry with an adapterId, a tier, and two prices — costPer1kInputUsd and costPer1kOutputUsd. There are 11 entries spread across the tiers (the map records this; representative values shown):

// packages/contracts/src/registry.ts — shape + representative rows
{ modelId: 'local-default',  adapterId: 'local',       tier: T0, in: 0,      out: 0 }       // $0 hermetic
{ modelId: 'local-extract',  adapterId: 'local',       tier: T1, in: 0,      out: 0 }
{ modelId: 'qwen3.7-plus',    adapterId: 'cliproxyapi', tier: T1, in: 0.00015,out: 0.0005 }
{ modelId: 'glm-5.2',         adapterId: 'cliproxyapi', tier: T2, in: 0.0002, out: 0.0006 }  // DEFAULT_MODEL_ID
{ modelId: 'gpt-5.5-xhigh',   adapterId: 'cliproxyapi', tier: T3, in: 0.015,  out: 0.045 }
{ modelId: 'claude-opus-4-8-max', adapterId: 'cliproxyapi', tier: T3, in: 0.03, out: 0.15 }

The split across tiers (per the complete-map §6): T0 = local-default, local-extract ($0); T1 = kimi-k2.7-code-highspeed, grok-composer-2.5-fast, qwen3.7-plus; T2 = deepseek-v4-pro, gemini-3.5-flash, glm-5.2, qwen3.7-max; T3 = gpt-5.5-xhigh, claude-opus-4-8-max. The local adapter keeps T0 at $0 for hermetic CI. (Model ids/prices are illustrative registry data and evolve; the shape is the contract.)

Routing: pick the cheapest qualifying model

When a tier is chosen but no specific modelId is pinned, pickCheapestForTier selects the cheapest entry of that tier by combined per-1k cost. It's pure over the registry — a deterministic fold, no side effects:

// packages/contracts/src/registry.ts:145-156
export const pickCheapestForTier = (tier, registry = MODEL_REGISTRY) => {
  const candidates = Object.values(registry).filter((e) => e.tier === tier);
  if (candidates.length === 0) return undefined;
  return candidates.reduce((cheapest, entry) => {
    const entryCost = entry.costPer1kInputUsd + entry.costPer1kOutputUsd;   // in + out
    const bestCost = cheapest.costPer1kInputUsd + cheapest.costPer1kOutputUsd;
    return entryCost < bestCost ? entry : cheapest;
  });
};

Why combined input+output? A model cheap on input but expensive on output could lose on a generation-heavy task. Summing both gives a single comparable scalar. ADR-0006 layers a depth floor on top: the router should pick the cheapest model above a quality floor, reserving frontier models (the T3 pair) for hard adjudication — so "cheapest" never means "too weak."

The budget guard: fail-closed metering

Routing decides which model; the BudgetGuard decides whether the call may happen at all. It's created with a hard USD cap and three methods — and its defaults are deliberately paranoid:

// packages/etl/src/budget.ts:120-152 (condensed)
export const createBudgetGuard = (capUsd) => {
  const cap = Math.max(0, capUsd);   // a non-positive cap clamps to 0 = free-tier only
  let spent = 0;
  return {
    check(estimate) {
      const projectedUsd = priceEstimate(estimate);
      if (projectedUsd === 0) return { ok: true, projectedUsd: 0, … };   // free call always passes
      if (roundUsd(spent + projectedUsd) > cap)
        return { ok: false, reason: 'budget_exceeded', … };       // would overshoot ⇒ BLOCK
      return { ok: true, projectedUsd, remainingUsd: remaining() };
    },
    record(spend) { spent = roundUsd(spent + costOf(spend)); return spent; },
  };
};

The fail-closed move: cap = Math.max(0, capUsd)

A non-positive cap doesn't mean "no limit" — it clamps to 0, which means free tier only: any call with projectedUsd > 0 is blocked, because 0 + anything > 0. So the safe default (no budget set) is the cheapest possible posture, never a runaway spend. Free-tier calls (projectedUsd === 0) always pass, which is why the whole funnel can run $0 and hermetic on the LOCAL adapter (Lesson 15).

Pricing is always applied at metering

The map states a subtle invariant: pricing is always applied. Look at record's helper costOf — even when a ModelRunResult carries a costUsd, a free-tier model is metered as 0 (isFreeTierModel(spend.modelId), budget.ts:110), and a paid model falls back to costForModel(modelId, usage) if no explicit cost is present. Spend is never guessed and never skipped: T2/T3 calls are metered to the registry price, so the running spent total is authoritative.

1. You create createBudgetGuard(0) (or a negative cap). What can run?

Correct: c. cap = Math.max(0, capUsd) clamps to 0, and check passes a call only if projectedUsd === 0 or it fits under the cap. With cap 0, only free calls (LOCAL/T0) pass — fail-closed: the default posture spends nothing.

2. Two T2 models cost (in+out) $0.0006 and $0.0009 per 1k. With no pinned modelId, which does pickCheapestForTier(T2) return, and is the choice deterministic?

Correct: b. The reduce keeps the entry with the smaller costPer1kInputUsd + costPer1kOutputUsd. It's pure over MODEL_REGISTRY (no Date/random), so the same registry always yields the same pick — which matters for replay (Lesson 28).

3. A ModelRunResult from a free-tier model carries a stray costUsd: 0.01. How does record meter it?

Correct: d. costOf (budget.ts:109-111) returns 0 for a free-tier model before reading costUsd. Pricing is always applied from the model's tier/registry, so a stray field can't inflate the spend total — metering is authoritative, not advisory.

Common confusions

"LOCAL is the sixth tier." No — Tier is exactly T0–T4. LOCAL is an orthogonal marker that pins work to $0 models regardless of its tier. A unit can be both T2 and LOCAL: autonomous-with-a-reviewer, but kept on local models for privacy or cost.

"Tier sets the cost." They're related but separate. Tier is autonomy/oversight; cost is per-model pricing. A T3 council step uses pricier models, yes — but the budget guard, not the tier, is what stops a paid call from exceeding the cap. Two dials, two mechanisms.

← Lesson 26 Lesson 28 →

Sources (read verbatim):
· packages/contracts/src/tier.ts — Tier T0–T4 (15–21), LOCAL orthogonal marker (36), TIER_LADDER (42–48), DEFAULT_TIER = T4 (51), isAutonomous T1–T3 (59).
· packages/contracts/src/registry.ts — entry shape (8–13), the 11 entries incl. local-default/glm-5.2/gpt-5.5-xhigh/claude-opus-4-8-max (42–132), DEFAULT_MODEL_ID = 'glm-5.2' (134), pickCheapestForTier combined-cost fold (145–156).
· packages/etl/src/budget.ts — createBudgetGuard fail-closed cap clamp + check/record (120–152), costOf free-tier-metered-as-0 (104–114).
· docs/alembic-complete-map.md §3 (T2/T3 metered, "pricing always applied") + §6 (registry split); ADR-0006 (cheapest-above-depth-floor). ← Course hub · Português