The Hermes Agent is Nous Research's "self-improving AI agent" — a million-line Python codebase. You cannot fuse what you do not understand, so the fusion began with a map. This lesson is that map's core: what Hermes is, the one structural idea that makes it extensible, and the closed learning loop that became the whole reason for the fusion.
Scale reality: the map is not a line-by-line read of all 1.18M LOC — it deep-dives the priorities and catalogs the long tail, with the coverage frontier stated explicitly. Source: docs/hermes-complete-map.md §1, §6.
Hermes is a personal AI agent that runs the same agent core across many surfaces — a CLI, a messaging gateway (Telegram, Discord, Slack, and ~20 other platforms), an Ink/React TUI, and an Electron desktop app. Built by Nous Research, MIT-licensed. Its headline differentiators, in its own words:
hermes model, no code changes.1 · Per-conversation prompt caching is sacred. A long conversation reuses a cached prefix every turn; anything that mutates past context or rebuilds the system prompt mid-conversation invalidates the cache and multiplies cost. The only sanctioned exception is context compression. 2 · The core is a narrow waist; capability lives at the edges. Every model tool is sent on every API call, so the bar for a new core tool is high — new capability arrives as a CLI command + skill or a plugin, not as core surface.
Notice the rhyme with Lesson 1: Hermes and Alembic independently arrived at "a narrow waist + cache-stable prompt." That shared instinct is exactly why the fusion composes instead of fighting — and it is the thread we pull on in Lesson 3.
This is the single most important and most portable structural idea in Hermes. Tools self-register at import time; nothing maintains a central import list. Each tool file calls registry.register(...) at module level; a discovery pass globs the directory and imports every file, and the import side-effect populates the registry.
There is a crucial nuance the map insists on — and it is the source of a number that gets mis-quoted. Auto-discovery registers a tool's schema, but the tool is only exposed to an agent if its name appears in a toolset in toolsets.py. So three different counts describe three different things:
| Count | What it actually measures |
|---|---|
| 87 | tool files in tools/*.py (verified ls | wc -l). Includes infra files like registry.py, security scanners, and shared helpers — not all are agent-facing tools. |
| 30 | toolset keys in toolsets.py (browser, memory, skills, web, …) — the named bundles a platform inherits from. |
| "64 tools" | The figure the orange-book learning material cites for agent-facing tools. We keep the file/tool distinction explicit rather than collapsing all three into one number. |
The headline property is concrete in-code, not marketing. After a turn, the agent can fork itself to review what just happened and decide whether anything is worth remembering — and the writes land in durable stores the next session reads.
The pieces, each concrete in source:
agent/background_review.py): after a turn, spawn_background_review fires a daemon thread that replays the conversation snapshot in a forked AIAgent and asks "should any skill/memory be saved or updated?". It runs with a tool whitelist limited to memory + skill-management tools only, and it inherits the parent's cached system prompt verbatim — so it hits the same prefix cache and never touches the live conversation._iters_since_skill counter and nudges the model to persist a skill after enough tool-calling iterations.agent/curator.py): auto-manages the lifecycle of agent-created skills — active → stale → archived, never deletes, pinned skills exempt, only created_by: "agent" skills touched.volatile tier holds the MEMORY.md snapshot + USER.md, which is exactly why they are frozen at session start — mutating them mid-session would break the cache.For Alembic, this whole loop — fork-after-turn → self-review under a memory/skill-only whitelist → write to durable stores → curator lifecycle → next session reads the refreshed snapshot — is the highest-value conceptual CLONE, and it composes cleanly with Alembic's existing validator/gate pipeline.
The hermes-mini-loop repo is a minimal reference implementation of the same idea, and it states the discipline as three rules — the rules that keep a self-improving loop from drowning in its own noise:
| Rule | What it prevents |
|---|---|
| Learn only from wins (score ≥ 0.7) | Sedimenting lessons from failed or low-confidence turns. Only validated successes become durable memory. |
| Reinforce, don't duplicate | Re-seeing the same fact should strengthen the existing entry, not append a near-copy that bloats the bounded store. |
| Recombine proven atoms | New skills are composed from already-validated pieces rather than invented wholesale — improvement by recombination. |
Hold onto the 0.7 threshold and "reinforce-don't-duplicate" — both reappear, ported verbatim, as the default gate and the dedup behavior of Alembic's loop in Lesson 4.
providers/ + plugins/model-providers/ + a local OpenAI-compatible proxy — a different subsystem entirely.tools/. Exposure is gated by the 30 toolset keys; the orange-book counts ~64 agent-facing tools. Three numbers, three meanings.