Lesson 6 told the story of the orphaned-worker leak — 16 stray vitest processes pinning ~1550% CPU. This lesson is the engineering underneath the fix: how a UNIX process group works, why detached:true creates one, why kill(-pgid) reaches every descendant where kill(pid) can't, and how the two-layer defense — a hardened vitest.config.ts plus a process-group-killing safe-test.mjs wrapper plus a post-run sweep — makes the leak structurally impossible. It's a small file, but every line is load-bearing, and the lesson generalizes to any tool that spawns a worker tree.
Vitest's tinypool runs tests in worker child processes. If a test hangs with no teardown (an open socket, an unresolved promise, a live interval) and the parent is killed by PID alone, the workers don't die — they get reparented to PID 1 (the init process) and keep running, each pinning a core. Killing the parent process is not enough: you have to kill the whole tree, and a tree that has already reparented is no longer reachable from the parent at all.
The first defense stops the hang from happening in the first place. The shared config sets bounded timeouts and the forks pool, so a stuck file fails fast and is force-killed on teardown rather than spinning forever:
// vitest.config.ts:13-28 — the anti-orphan hardening export default defineConfig({ test: { environment: 'node', // A hung test (server/socket/MCP/interval/unresolved promise with no // teardown) must FAIL on a bounded timeout, never hang a worker forever. testTimeout: 15_000, hookTimeout: 15_000, teardownTimeout: 10_000, pool: 'forks', // isolates hangs in child processes Vitest force-kills on teardown }, });
Why pool:'forks' and not the default worker threads? A hang inside a thread can wedge the host process; a hang inside a fork (a separate OS process) is isolated and "Vitest force-kills on teardown, so a stuck file cannot pin a CPU core." The timeouts turn an infinite wait into a test failure — visible, bounded, and CI-red.
Config alone can't cover every escape (a native handle, a SIGKILL'd parent mid-run). So scripts/safe-test.mjs runs the whole suite in its own process group and kills the group, not the PID. The key is detached:true:
// scripts/safe-test.mjs:34-44 // detached:true => POSIX setsid => the child leads a NEW process group, so // kill(-pid) reaches every descendant (vitest main + all tinypool forks). const child = spawn(bin, args, { stdio: 'inherit', detached: true }); const killGroup = (signal) => { try { process.kill(-child.pid, signal); // NEGATIVE pid = the whole group } catch { /* group already gone */ } };
On POSIX, kill(pid, sig) signals one process; kill(-pgid, sig) signals every process in that group. detached:true calls setsid() so the child becomes a group leader — its PID is the group id. So process.kill(-child.pid, …) reaches the vitest main process and every tinypool fork in one syscall. That's the difference between "killed the parent, orphaned the kids" and "killed the family."
A bounded wall-clock timer escalates politely-then-forcibly. SIGTERM first (let it clean up), SIGKILL 5 seconds later (force it), then a sweep, and exit 124 (the conventional timeout code):
// scripts/safe-test.mjs:46-58 const timer = setTimeout(() => { timedOut = true; process.stderr.write(`\n[safe-test] HARD TIMEOUT after ${TIMEOUT_MS}ms — killing process group -${child.pid}\n`); killGroup('SIGTERM'); // ask nicely setTimeout(() => { killGroup('SIGKILL'); // then force, 5s later sweep(); process.exit(124); // conventional "timed out" code }, 5_000); }, TIMEOUT_MS); timer.unref(); // don't keep the event loop alive for the timer alone
If a worker reparented to PID 1 before the group-kill, it's no longer in the group — the group-kill can't reach it. The last-resort sweep does, mirroring the operator's manual net pgrep -f vitest | kill -9:
// scripts/safe-test.mjs:24-32 const sweep = () => { try { execFileSync('pkill', ['-9', '-f', 'vitest'], { stdio: 'ignore' }); } catch { /* nothing left to kill — pkill exits non-zero when no match */ } };
The sweep runs on every exit path — normal exit, signal, and timeout — so leakage "must not accumulate" (safe-test.mjs:80). On a clean exit the wrapper still calls killGroup('SIGKILL') to "reap any fork still lingering in the group," then sweeps. Belt and suspenders, because the cost of a leak is hours of pinned CPU.
| Layer | Catches | Misses (handed to next) |
|---|---|---|
| config timeouts + forks | most hangs — they fail fast and Vitest force-kills the fork | a parent killed externally mid-run; a native handle Vitest can't reap |
| group-kill (detached) | the whole live tree in one syscall | a worker that already reparented to PID 1 before the kill |
| pkill sweep | any stray vitest by name, including PID-1 orphans | — (the floor) |
Each layer's miss is the next layer's job. That is defense in depth: no single mechanism is trusted to be perfect, and the failure mode (a pinned core for hours) is severe enough to justify the redundancy.
safe-test.mjs spawn the suite with detached:true?detached:true ⇒ setsid ⇒ the child is a group leader whose PID is the group id. A negative PID in kill addresses the whole group, so one syscall reaches the entire worker tree — exactly what a plain kill(pid) cannot do.kill(-pgid) can't reach it. The name-based sweep is precisely the last-resort net for that case, and it runs on every exit path.testTimeout/teardownTimeout and use pool:'forks' instead of relying only on the kill-the-group wrapper?kill(pid) kills the children too." It doesn't — it signals one process. Children survive and reparent to PID 1. You need the process-group form (kill(-pgid)) to reach the tree, which is exactly why detached:true exists in the wrapper.pgrep -f vitest empty.