Lesson 04 · The keystone result

The Closed Self-Improving Loop

Lesson 3 named the learning loop the keystone. Here is how it actually ships: three subsystems — memory, learning, curator — that together let a finished run make the next run smarter, without ever auto-writing an unvalidated lesson into durable memory.

Three parts, one loop

1 · Memory — the frozen snapshot

Two bounded, file-backed stores persist across sessions: MEMORY.md (the agent's own notes) and USER.md (what it knows about the user). Both are injected into the system prompt as a frozen snapshot at session start. The discipline that matters:

Mid-session writes hit disk immediately (durable) but do not change the snapshot — so the prompt-prefix cache stays warm for the whole session.
The snapshot refreshes only on the next session start (a fresh load). That's what makes "next run is smarter" literal.
Carried over from the Hermes source exactly: one memory op with action ∈ {add, replace, remove}; replace/remove locate the target by a short unique substring (no IDs); entries are delimited by § on its own line; limits are in characters, not tokens (model-independent).

// packages/hermes/src/memory/memory-store.ts:50-57
export const ENTRY_DELIMITER = '\n§\n';
/** Default character limit for the MEMORY.md store (Hermes default). */
export const DEFAULT_MEMORY_CHAR_LIMIT = 2200;
/** Default character limit for the USER.md store (Hermes default). */
export const DEFAULT_USER_CHAR_LIMIT = 1375;

This subsystem is a faithful CLONE of tools/memory_tool.py (1089 LOC). Deviations are deliberate: IO is injected via FsPort, and every fallible op returns Result<T,Error> instead of a Python dict.

2 · Learning — propose, then the Validator disposes

The single most important design choice — ADR-0018

Hermes auto-writes to memory after a turn. Alembic does not. The reviewer only proposes; Alembic's existing Validator disposes. Writes are Validator-gated, never auto-applied.

Why the change? Two reasons from the ADR, both principled:

There is no Python AIAgent in Alembic to fork as a daemon thread — a synchronous post-unit pass over injected ports is the right unit, and it composes with the harness.
More importantly, auto-writing would bypass the Validator Gate and let unvalidated lessons harden into durable memory — the exact failure mode ADR-0006 exists to prevent ("nothing sediments without clearing a quality floor").

So the loop is three injected ports and one kernel:

Port	Role
`ReviewProposer`	Returns `ReviewProposal`s from the turn summary — each a `{ target, op, rationale, score }`. In production it wraps one `ModelAdapter` call; in tests, a fake.
`ReviewGate`	Disposes each proposal (approve/reject). The default is `scoreThresholdGate(0.7)`; the real coda Validator wires in later by supplying its own gate — no change to the kernel.
`MemoryStore`	The store approved writes apply to — reusing its dedup, so re-seeing a fact reinforces rather than duplicates.

// packages/hermes/src/learning/review.ts:54-69 — the kernel
export const reviewAndLearn = async (summary, deps) => {
  if (summary.trim().length === 0) return ok(emptyOutcome());   // "Nothing to save."
  const proposed = await deps.proposer(summary);
  if (!proposed.ok) return proposed;                          // proposer error → fail closed
  if (proposed.value.length === 0) return ok(emptyOutcome());
  const acc = { applied: [], rejected: [], failed: [] };
  for (const raw of proposed.value) {
    const stepErr = await processOne(raw, deps, acc);   // validate → gate → apply
    if (stepErr) return stepErr;                            // gate error → fail closed
  }
  return ok({ applied: acc.applied, rejected: acc.rejected, failed: acc.failed });
};

Three outcome buckets — applied / rejected / failed — so nothing is silently dropped. Proposer output is Zod-validated at the boundary (it is untrusted model output in production). A proposer or gate error fails the whole pass closed; a store rejection of an approved write is recorded in failed, never thrown.

// packages/hermes/src/learning/gate.ts:24-36 — the default conservative gate
export const scoreThresholdGate = (min = DEFAULT_REVIEW_SCORE_THRESHOLD) => {
  return async (proposal) => {
    const approved = proposal.score >= min;          // boundary inclusive: score === min approves
    const reason = approved
      ? `score ${proposal.score} ≥ threshold ${min}`
      : `score ${proposal.score} < threshold ${min} (learn only from validated wins)`;
    return ok({ approved, reason });                  // pure + total: ok(verdict) for every input
  };
};

The default threshold is 0.7 — the mechanical encoding of the hermes-mini-loop rule "learn only from validated wins." Note the decision lives in verdict.approved, not in the Result: a rejection is a normal ok(...), not an error.

3 · Curator — the disposal half

The agent authors skills; telemetry accrues; the curator is the deterministic pass that keeps the skill library clean. It is a faithful CLONE of agent/curator.py:apply_automatic_transitions, with four rules cloned exactly:

Provenance gate: only createdBy === 'agent' skills are touched; everything else is skipped.
Pin exemption: a pinned skill is never transitioned, on any path.
Never delete: the terminal state is archived — "max action = archive." There is no removal.
The four transitions: active/stale past the archive cutoff → archived; active past the stale cutoff → stale; a stale skill used again → reactivated to active.

Time is an injected Clock — never Date.now() (the engine's determinism rule, and what makes the transition tests reproducible). The curator is the same Clock the usage store was built with, so an event recorded "now" and a transition decided "now" agree.

Why gated, not auto-apply — the one idea to keep

Auto-apply would be faster. It was rejected on purpose. ADR-0018 considered "auto-apply writes after each run (literal Hermes behavior)" and rejected it: it bypasses the Validator Gate and lets unvalidated lessons harden into durable memory — the exact failure mode ADR-0006 exists to prevent. The whole point of the fusion is that the loop composes with the gate pipeline rather than going around it.

1. A mid-session memory write succeeds. Does the system prompt change for the rest of that session?

Correct: b. The snapshot is frozen at session start. Writes are durable immediately but don't invalidate the prompt prefix — that's the whole point. "Next run is smarter" is literal: the refresh happens on the next session's load.

2. The reviewer proposes a write with score: 0.6 and the default gate is in use. What happens?

Correct: d. The default scoreThresholdGate(0.7) returns ok({approved:false, reason}) — a rejection is a normal result, not an error. It lands in rejected; only a proposer/gate error fails the pass closed.

3. The curator finds a long-unused skill with pinned: true and createdBy: 'user'. What does it do?

Correct: c. Two guards both apply: the provenance gate only touches createdBy === 'agent' skills, and pinned skills are never transitioned. And the terminal state is archived — there is no delete path at all.

Common confusions

"The reviewer is a background daemon, like Hermes." No — in Alembic it's a synchronous post-unit pass over injected ports (ADR-0018). No thread, no fork; that's what makes it testable and composable with the harness.

"Gated means slow / human-in-the-loop on every write." No — the default gate is a pure score ≥ 0.7 check with no human and no I/O. "Gated" means a quality floor must be cleared; the floor can later be the full coda Validator by injecting a different gate — the kernel never changes.

← Lesson 3 Lesson 5 →

Sources (all in the repo):
· docs/adr/0018-internalize-validator-gated-self-improvement-loop.md — the propose→dispose decision; the four fail-closed rules; the considered-and-rejected auto-apply option.
· packages/hermes/src/memory/memory-store.ts — frozen-snapshot store; ENTRY_DELIMITER='\n§\n', char limits 2200 / 1375 (lines 50–57).
· packages/hermes/src/learning/review.ts — reviewAndLearn kernel (lines 54–69) + processOne bucketing.
· packages/hermes/src/learning/gate.ts — scoreThresholdGate(0.7) (lines 24–36).
· packages/hermes/src/curator/curator.ts — provenance gate, pin-exemption, never-delete, four transitions, injected Clock (header, lines 1–32).
· docs/hermes-complete-map.md §1.9/§1.10/§3.2/§3.3/§5.1; ADR-0006 (validator as emission gate). ← Course hub · Português