A bounded, file-backed memory the agent carries across sessions — two stores (MEMORY.md for the agent's own notes, USER.md for what it knows about you), edited by a single memory operation that locates entries by a short unique substring. Its defining trick is the frozen snapshot: what the system prompt sees is captured once at load and never moves, even as mid-session writes hit disk. This is a faithful CLONE of Hermes' tools/memory_tool.py (1089 LOC).
The whole subsystem is one class and a few constants, all named exports from packages/hermes/src/index.ts:
| Export | Role |
|---|---|
MemoryStore | the class — load() / renderSnapshot() / entries() / apply() |
ENTRY_DELIMITER | '\n§\n' — the section-sign delimiter between entries |
DEFAULT_MEMORY_CHAR_LIMIT | 2200 — char cap for MEMORY.md |
DEFAULT_USER_CHAR_LIMIT | 1375 — char cap for USER.md |
MemoryOp | the discriminated op union: add / replace / remove |
MemoryOpOutcome | terminal payload: target, message, entryCount, usedChars, charLimit |
The operation itself is a compile-time discriminated union — there is no string-typed "params" bag:
// packages/hermes/src/memory/memory-store.ts:85-88 export type MemoryOp = | { readonly action: 'add'; readonly content: string } | { readonly action: 'replace'; readonly oldText: string; readonly content: string } | { readonly action: 'remove'; readonly oldText: string };
Only the two values that cross an untyped boundary in the Python source — target and action — are Zod-validated at runtime (memory/schema.ts: z.enum(['memory','user']) and z.enum(['add','replace','remove'])). The op payload is a compile-time union, so an illegal shape can't even be written.
The store holds two parallel realities. load() reads disk, dedupes, and freezes a snapshot; apply() mutates live entries and persists them — but never touches the snapshot:
// packages/hermes/src/memory/memory-store.ts:119-147 (condensed) async load(): Promise<Result<void, Error>> { // … read MEMORY.md + USER.md via FsPort … this.memoryEntries = dedupe(memory.value); // Python dict.fromkeys parity this.userEntries = dedupe(user.value); this.snapshot = { // frozen ONCE, here memory: this.renderBlock('memory', this.memoryEntries), user: this.renderBlock('user', this.userEntries), }; return ok(undefined); } renderSnapshot(target: MemoryTarget): string | undefined { const block = this.snapshot[target]; // load-time state, not live return block.length > 0 ? block : undefined; }
load(). The test "snapshot reflects load-time disk state, not mid-session writes" proves it: after an apply(), renderSnapshot() is byte-identical to before, yet disk already contains the new entry.replace and remove don't take an index or an id. They take a short substring and the store finds the one entry that contains it. The matcher fails closed on ambiguity:
// packages/hermes/src/memory/memory-store.ts:369-392 — locateUnique const matches = entries.flatMap((entry, index) => entry.includes(needle) ? [{ index, entry }] : [], ); if (matches.length === 0) return err(new Error(`No entry matched '${needle}'.`)); if (matches.length > 1) { const distinct = new Set(matches.map((m) => m.entry)); if (distinct.size > 1) { return err(new Error(`Multiple entries matched '${needle}'. Be more specific.`)); } // All identical — safe to operate on the first. }
The subtlety: multiple matches are only an error when they are distinct entries. If every match is the exact same text (true duplicates), acting on the first is safe — the source's fidelity rule. Duplicates can't arrive via add() (it dedupes), so the test seeds them on disk and loads to exercise that branch.
Limits are counted in characters, not tokens, because char counts are model-independent. Crucially, the ENTRY_DELIMITER counts against the budget — the store measures the joined length, exactly what lands on disk:
// packages/hermes/src/memory/memory-store.ts:344-346 const joinedLength = (entries: readonly string[]): number => entries.length === 0 ? 0 : entries.join(ENTRY_DELIMITER).length;
The test "counts the delimiter against the budget" pins it: 'aaa' + '\n§\n' (3 chars) + 'bbb' = 9 chars, so a limit of 9 passes and 8 fails. When an add would overflow, the error isn't a bare failure — it tells the agent to consolidate ("use 'replace' to merge … or 'remove' stale entries, then retry"), turning a hard cap into a curation prompt.
A corrupt or missing file on read returns ok([]) (an empty store) — a telemetry-like hot path must never break the host. But a failed write returns err — the caller explicitly chose to persist, so a silent loss would be a lie. The same asymmetry recurs in the curator's usage store; it is a deliberate, repeated design stance, not an accident.
apply('memory', {action:'add', …}) mid-session. What does renderSnapshot('memory') return immediately after?load() and never mutated mid-session, preserving the prompt-prefix cache. The write IS durable on disk and visible via entries(), but the in-prompt snapshot only refreshes on the next session's load().replace is called with oldText: 'task:' and two distinct entries contain 'task:'. What happens?locateUnique returns err when matches are distinct, and the store leaves every entry untouched. (If the matches were exact duplicates of one another, acting on the first is the source's fidelity rule — but distinct matches always fail closed.)'\n§\n', the budget includes those 3 chars per gap — the test proves 9 passes, 8 fails for two 3-char entries.entries(). Only the in-prompt snapshot is frozen, and only until the next load(). Durability and prefix-cache stability are independent concerns the design keeps separate.§ in my note will corrupt the file." No — the store splits on the full '\n§\n' sequence, not a bare §. An entry whose body contains a lone § round-trips as one entry; there's a dedicated test for exactly that.