Lesson 3 named the learning loop the keystone. Here is how it actually ships: three subsystems — memory, learning, curator — that together let a finished run make the next run smarter, without ever auto-writing an unvalidated lesson into durable memory.
Two bounded, file-backed stores persist across sessions: MEMORY.md (the agent's own notes) and USER.md (what it knows about the user). Both are injected into the system prompt as a frozen snapshot at session start. The discipline that matters:
memory op with action ∈ {add, replace, remove}; replace/remove locate the target by a short unique substring (no IDs); entries are delimited by § on its own line; limits are in characters, not tokens (model-independent).// packages/hermes/src/memory/memory-store.ts:50-57 export const ENTRY_DELIMITER = '\n§\n'; /** Default character limit for the MEMORY.md store (Hermes default). */ export const DEFAULT_MEMORY_CHAR_LIMIT = 2200; /** Default character limit for the USER.md store (Hermes default). */ export const DEFAULT_USER_CHAR_LIMIT = 1375;
This subsystem is a faithful CLONE of tools/memory_tool.py (1089 LOC). Deviations are deliberate: IO is injected via FsPort, and every fallible op returns Result<T,Error> instead of a Python dict.
Hermes auto-writes to memory after a turn. Alembic does not. The reviewer only proposes; Alembic's existing Validator disposes. Writes are Validator-gated, never auto-applied.
Why the change? Two reasons from the ADR, both principled:
AIAgent in Alembic to fork as a daemon thread — a synchronous post-unit pass over injected ports is the right unit, and it composes with the harness.So the loop is three injected ports and one kernel:
| Port | Role |
|---|---|
ReviewProposer | Returns ReviewProposals from the turn summary — each a { target, op, rationale, score }. In production it wraps one ModelAdapter call; in tests, a fake. |
ReviewGate | Disposes each proposal (approve/reject). The default is scoreThresholdGate(0.7); the real coda Validator wires in later by supplying its own gate — no change to the kernel. |
MemoryStore | The store approved writes apply to — reusing its dedup, so re-seeing a fact reinforces rather than duplicates. |
// packages/hermes/src/learning/review.ts:54-69 — the kernel export const reviewAndLearn = async (summary, deps) => { if (summary.trim().length === 0) return ok(emptyOutcome()); // "Nothing to save." const proposed = await deps.proposer(summary); if (!proposed.ok) return proposed; // proposer error → fail closed if (proposed.value.length === 0) return ok(emptyOutcome()); const acc = { applied: [], rejected: [], failed: [] }; for (const raw of proposed.value) { const stepErr = await processOne(raw, deps, acc); // validate → gate → apply if (stepErr) return stepErr; // gate error → fail closed } return ok({ applied: acc.applied, rejected: acc.rejected, failed: acc.failed }); };
Three outcome buckets — applied / rejected / failed — so nothing is silently dropped. Proposer output is Zod-validated at the boundary (it is untrusted model output in production). A proposer or gate error fails the whole pass closed; a store rejection of an approved write is recorded in failed, never thrown.
// packages/hermes/src/learning/gate.ts:24-36 — the default conservative gate export const scoreThresholdGate = (min = DEFAULT_REVIEW_SCORE_THRESHOLD) => { return async (proposal) => { const approved = proposal.score >= min; // boundary inclusive: score === min approves const reason = approved ? `score ${proposal.score} ≥ threshold ${min}` : `score ${proposal.score} < threshold ${min} (learn only from validated wins)`; return ok({ approved, reason }); // pure + total: ok(verdict) for every input }; };
The default threshold is 0.7 — the mechanical encoding of the hermes-mini-loop rule "learn only from validated wins." Note the decision lives in verdict.approved, not in the Result: a rejection is a normal ok(...), not an error.
The agent authors skills; telemetry accrues; the curator is the deterministic pass that keeps the skill library clean. It is a faithful CLONE of agent/curator.py:apply_automatic_transitions, with four rules cloned exactly:
createdBy === 'agent' skills are touched; everything else is skipped.pinned skill is never transitioned, on any path.archived — "max action = archive." There is no removal.Time is an injected Clock — never Date.now() (the engine's determinism rule, and what makes the transition tests reproducible). The curator is the same Clock the usage store was built with, so an event recorded "now" and a transition decided "now" agree.
score: 0.6 and the default gate is in use. What happens?scoreThresholdGate(0.7) returns ok({approved:false, reason}) — a rejection is a normal result, not an error. It lands in rejected; only a proposer/gate error fails the pass closed.pinned: true and createdBy: 'user'. What does it do?createdBy === 'agent' skills, and pinned skills are never transitioned. And the terminal state is archived — there is no delete path at all.score ≥ 0.7 check with no human and no I/O. "Gated" means a quality floor must be cleared; the floor can later be the full coda Validator by injecting a different gate — the kernel never changes.