Lesson 02 · The system we learned from

Reverse-Engineering Hermes

The Hermes Agent is Nous Research's "self-improving AI agent" — a million-line Python codebase. You cannot fuse what you do not understand, so the fusion began with a map. This lesson is that map's core: what Hermes is, the one structural idea that makes it extensible, and the closed learning loop that became the whole reason for the fusion.

~2,497

Python files

~1.18M

lines of code

tool files in tools/

toolset keys

Scale reality: the map is not a line-by-line read of all 1.18M LOC — it deep-dives the priorities and catalogs the long tail, with the coverage frontier stated explicitly. Source: docs/hermes-complete-map.md §1, §6.

What Hermes is

Hermes is a personal AI agent that runs the same agent core across many surfaces — a CLI, a messaging gateway (Telegram, Discord, Slack, and ~20 other platforms), an Ink/React TUI, and an Electron desktop app. Built by Nous Research, MIT-licensed. Its headline differentiators, in its own words:

A closed learning loop — agent-curated memory with periodic nudges; autonomous skill creation after complex tasks; skills that self-improve during use; FTS5 session search with LLM summarization for cross-session recall. the part that matters for us
Model-agnostic — Nous Portal, OpenRouter (200+ models), and many others; switch with hermes model, no code changes.
Lives where you do — one gateway process bridges Telegram / Discord / Slack / WhatsApp / Signal / CLI, with voice-memo transcription.
Delegates and parallelizes — spawns isolated subagents for parallel workstreams.
Runs anywhere — six terminal backends behind one ABC: local, Docker, SSH, Singularity, Modal, Daytona.

The two design constraints everything is downstream of

1 · Per-conversation prompt caching is sacred. A long conversation reuses a cached prefix every turn; anything that mutates past context or rebuilds the system prompt mid-conversation invalidates the cache and multiplies cost. The only sanctioned exception is context compression. 2 · The core is a narrow waist; capability lives at the edges. Every model tool is sent on every API call, so the bar for a new core tool is high — new capability arrives as a CLI command + skill or a plugin, not as core surface.

Notice the rhyme with Lesson 1: Hermes and Alembic independently arrived at "a narrow waist + cache-stable prompt." That shared instinct is exactly why the fusion composes instead of fighting — and it is the thread we pull on in Lesson 3.

The linchpin: the tool-registration architecture

This is the single most important and most portable structural idea in Hermes. Tools self-register at import time; nothing maintains a central import list. Each tool file calls registry.register(...) at module level; a discovery pass globs the directory and imports every file, and the import side-effect populates the registry.

There is a crucial nuance the map insists on — and it is the source of a number that gets mis-quoted. Auto-discovery registers a tool's schema, but the tool is only exposed to an agent if its name appears in a toolset in toolsets.py. So three different counts describe three different things:

Count	What it actually measures
87	tool files in `tools/*.py` (verified `ls \| wc -l`). Includes infra files like `registry.py`, security scanners, and shared helpers — not all are agent-facing tools.
30	toolset keys in `toolsets.py` (`browser`, `memory`, `skills`, `web`, …) — the named bundles a platform inherits from.
"64 tools"	The figure the orange-book learning material cites for agent-facing tools. We keep the file/tool distinction explicit rather than collapsing all three into one number.

Why this matters for the map. "How many tools does Hermes have?" has no single right answer — it depends whether you count files, toolset keys, or exposed tools. A faithful reverse-engineering states all three and what each measures, instead of picking the roundest.

The closed learning loop — the "self-improving" claim, mechanically (§1.10)

The headline property is concrete in-code, not marketing. After a turn, the agent can fork itself to review what just happened and decide whether anything is worth remembering — and the writes land in durable stores the next session reads.

The pieces, each concrete in source:

Background review fork (agent/background_review.py): after a turn, spawn_background_review fires a daemon thread that replays the conversation snapshot in a forked AIAgent and asks "should any skill/memory be saved or updated?". It runs with a tool whitelist limited to memory + skill-management tools only, and it inherits the parent's cached system prompt verbatim — so it hits the same prefix cache and never touches the live conversation.
Skill nudges: the loop bumps an _iters_since_skill counter and nudges the model to persist a skill after enough tool-calling iterations.
Curator (agent/curator.py): auto-manages the lifecycle of agent-created skills — active → stale → archived, never deletes, pinned skills exempt, only created_by: "agent" skills touched.
Memory lives inside the prompt: the system prompt's volatile tier holds the MEMORY.md snapshot + USER.md, which is exactly why they are frozen at session start — mutating them mid-session would break the cache.

The one sentence to carry into Lesson 3

For Alembic, this whole loop — fork-after-turn → self-review under a memory/skill-only whitelist → write to durable stores → curator lifecycle → next session reads the refreshed snapshot — is the highest-value conceptual CLONE, and it composes cleanly with Alembic's existing validator/gate pipeline.

The mini-loop discipline (§5)

The hermes-mini-loop repo is a minimal reference implementation of the same idea, and it states the discipline as three rules — the rules that keep a self-improving loop from drowning in its own noise:

Rule	What it prevents
Learn only from wins (score ≥ 0.7)	Sedimenting lessons from failed or low-confidence turns. Only validated successes become durable memory.
Reinforce, don't duplicate	Re-seeing the same fact should strengthen the existing entry, not append a near-copy that bloats the bounded store.
Recombine proven atoms	New skills are composed from already-validated pieces rather than invented wholesale — improvement by recombination.

Hold onto the 0.7 threshold and "reinforce-don't-duplicate" — both reappear, ported verbatim, as the default gate and the dedup behavior of Alembic's loop in Lesson 4.

Common confusions

"Hermes is a model gateway." Its gateway is a messaging gateway (Telegram/Discord/…), not a model gateway. The model-provider role is distributed across providers/ + plugins/model-providers/ + a local OpenAI-compatible proxy — a different subsystem entirely.

"The background reviewer changes the current conversation." No — it is a fork. It writes to durable stores; the live conversation and its prompt cache are never mutated. "Self-improving" is literal but deferred: the improvement shows up on the next session's load.

"87 = the number of tools." 87 is the number of files in tools/. Exposure is gated by the 30 toolset keys; the orange-book counts ~64 agent-facing tools. Three numbers, three meanings.

1. In Hermes, what is the relationship between the 87 and the 30?

Correct: b. Auto-discovery registers a tool's schema (87 files, many of them infra/helpers), but a tool is only exposed if its name is in a toolset (30 keys). The orange-book's "64 tools" is yet a third figure — agent-facing tools. A faithful map keeps the file/tool distinction.

2. Why does the background review run in a forked agent on a daemon thread, with a memory/skill-only tool whitelist?

Correct: d. Constraint #1 (prompt caching is sacred) forbids mutating the live conversation. A fork inherits the cached prompt verbatim, writes to durable stores, and leaves the main turn untouched — so "self-improving" is real but deferred to the next session.

3. The mini-loop's "learn only from wins (score ≥ 0.7)" rule exists to prevent what?

Correct: c. A self-improving loop that learns from everything quickly poisons itself. The 0.7 floor (plus "reinforce, don't duplicate" and "recombine proven atoms") keeps only validated wins — and that exact threshold is ported into Alembic's gate in Lesson 4.

← Lesson 1 Lesson 3 →

Sources (all in the repo):
· docs/hermes-complete-map.md §1.1 (what Hermes is), §1.2 (the two constraints), §1.5 (the tool-registration linchpin — 87 files), §1.9 (30 toolset keys; curator), §1.10 (the closed learning loop), §5 (hermes-mini-loop + orange-book), §6 (coverage statement: ~2,497 files / ~1.18M LOC).
· Subject repo: NousResearch/hermes-agent + satellites hermes-war-room, hermes-agent-orange-book (the "64 tools" figure), hermes-mini-loop.
The file/tool/toolset distinction (87 / 30 / 64) is stated verbatim from the map; the LOC scale is the map's coverage statement, not a full read. ← Course hub · Português