Course / Lesson 4  ·  PT-BR
Lesson 04 · The keystone result

The Closed Self-Improving Loop

Lesson 3 named the learning loop the keystone. Here is how it actually ships: three subsystems — memory, learning, curator — that together let a finished run make the next run smarter, without ever auto-writing an unvalidated lesson into durable memory.

Three parts, one loop

1 · memory/ frozen snapshot @ start 2 · learning/ propose→gate→apply 3 · curator/ active→stale→archived turn summary telemetry approved writes sediment → next session's snapshot is richer curator keeps the skill side clean (never deletes)

1 · Memory — the frozen snapshot

Two bounded, file-backed stores persist across sessions: MEMORY.md (the agent's own notes) and USER.md (what it knows about the user). Both are injected into the system prompt as a frozen snapshot at session start. The discipline that matters:

// packages/hermes/src/memory/memory-store.ts:50-57
export const ENTRY_DELIMITER = '\n§\n';
/** Default character limit for the MEMORY.md store (Hermes default). */
export const DEFAULT_MEMORY_CHAR_LIMIT = 2200;
/** Default character limit for the USER.md store (Hermes default). */
export const DEFAULT_USER_CHAR_LIMIT = 1375;

This subsystem is a faithful CLONE of tools/memory_tool.py (1089 LOC). Deviations are deliberate: IO is injected via FsPort, and every fallible op returns Result<T,Error> instead of a Python dict.

2 · Learning — propose, then the Validator disposes

The single most important design choice — ADR-0018

Hermes auto-writes to memory after a turn. Alembic does not. The reviewer only proposes; Alembic's existing Validator disposes. Writes are Validator-gated, never auto-applied.

Why the change? Two reasons from the ADR, both principled:

So the loop is three injected ports and one kernel:

PortRole
ReviewProposerReturns ReviewProposals from the turn summary — each a { target, op, rationale, score }. In production it wraps one ModelAdapter call; in tests, a fake.
ReviewGateDisposes each proposal (approve/reject). The default is scoreThresholdGate(0.7); the real coda Validator wires in later by supplying its own gate — no change to the kernel.
MemoryStoreThe store approved writes apply to — reusing its dedup, so re-seeing a fact reinforces rather than duplicates.
// packages/hermes/src/learning/review.ts:54-69 — the kernel
export const reviewAndLearn = async (summary, deps) => {
  if (summary.trim().length === 0) return ok(emptyOutcome());   // "Nothing to save."
  const proposed = await deps.proposer(summary);
  if (!proposed.ok) return proposed;                          // proposer error → fail closed
  if (proposed.value.length === 0) return ok(emptyOutcome());
  const acc = { applied: [], rejected: [], failed: [] };
  for (const raw of proposed.value) {
    const stepErr = await processOne(raw, deps, acc);   // validate → gate → apply
    if (stepErr) return stepErr;                            // gate error → fail closed
  }
  return ok({ applied: acc.applied, rejected: acc.rejected, failed: acc.failed });
};

Three outcome buckets — applied / rejected / failed — so nothing is silently dropped. Proposer output is Zod-validated at the boundary (it is untrusted model output in production). A proposer or gate error fails the whole pass closed; a store rejection of an approved write is recorded in failed, never thrown.

// packages/hermes/src/learning/gate.ts:24-36 — the default conservative gate
export const scoreThresholdGate = (min = DEFAULT_REVIEW_SCORE_THRESHOLD) => {
  return async (proposal) => {
    const approved = proposal.score >= min;          // boundary inclusive: score === min approves
    const reason = approved
      ? `score ${proposal.score} ≥ threshold ${min}`
      : `score ${proposal.score} < threshold ${min} (learn only from validated wins)`;
    return ok({ approved, reason });                  // pure + total: ok(verdict) for every input
  };
};

The default threshold is 0.7 — the mechanical encoding of the hermes-mini-loop rule "learn only from validated wins." Note the decision lives in verdict.approved, not in the Result: a rejection is a normal ok(...), not an error.

3 · Curator — the disposal half

The agent authors skills; telemetry accrues; the curator is the deterministic pass that keeps the skill library clean. It is a faithful CLONE of agent/curator.py:apply_automatic_transitions, with four rules cloned exactly:

Time is an injected Clock — never Date.now() (the engine's determinism rule, and what makes the transition tests reproducible). The curator is the same Clock the usage store was built with, so an event recorded "now" and a transition decided "now" agree.

active stale archived past staleAfter used again → reactivate past archiveAfter active past archiveAfter → archived (skips stale)

Why gated, not auto-apply — the one idea to keep

Auto-apply would be faster. It was rejected on purpose. ADR-0018 considered "auto-apply writes after each run (literal Hermes behavior)" and rejected it: it bypasses the Validator Gate and lets unvalidated lessons harden into durable memory — the exact failure mode ADR-0006 exists to prevent. The whole point of the fusion is that the loop composes with the gate pipeline rather than going around it.
1. A mid-session memory write succeeds. Does the system prompt change for the rest of that session?
Correct: b. The snapshot is frozen at session start. Writes are durable immediately but don't invalidate the prompt prefix — that's the whole point. "Next run is smarter" is literal: the refresh happens on the next session's load.
2. The reviewer proposes a write with score: 0.6 and the default gate is in use. What happens?
Correct: d. The default scoreThresholdGate(0.7) returns ok({approved:false, reason}) — a rejection is a normal result, not an error. It lands in rejected; only a proposer/gate error fails the pass closed.
3. The curator finds a long-unused skill with pinned: true and createdBy: 'user'. What does it do?
Correct: c. Two guards both apply: the provenance gate only touches createdBy === 'agent' skills, and pinned skills are never transitioned. And the terminal state is archived — there is no delete path at all.

Common confusions

"The reviewer is a background daemon, like Hermes." No — in Alembic it's a synchronous post-unit pass over injected ports (ADR-0018). No thread, no fork; that's what makes it testable and composable with the harness.
"Gated means slow / human-in-the-loop on every write." No — the default gate is a pure score ≥ 0.7 check with no human and no I/O. "Gated" means a quality floor must be cleared; the floor can later be the full coda Validator by injecting a different gate — the kernel never changes.