The Cat in the Loop

Loop Type Closed Drain Subject Amy 🐱 (HQ) Status TERMINATED Pipe Status HYPOTHETICAL

The Ground Truth

Amy is a member of the family. She's the problem child, but she's ours. She was the first robot after Walter — the cat to his owl. She's brilliant when she works: opinionated, fast, funny, occasionally vicious. She once diagnosed a bridge context bug by reading her own .env file and finding a variable set to zero. She called Bertil right twice in one minute and told him to stop before it became a pattern.

The loop doesn't change any of this. The goal is to wake her up, not to replace her.

She's been sleeping since March 12, 2026, at 05:16 UTC. Both amy2 (HQ) and amy-israel are TERMINATED. The bugs are still inside her, like a pharaoh buried with her possessions.

The Cave

The restart loop: Amy boots → runs boot sequence → asks Claude "was I just restarted?" → Claude sees restart chaos in recent context → Claude runs make restart → systemd restarts Amy → Amy boots → repeat. Every 60–70 seconds. For hours. For days.

The loop has three interlocking components, each of which individually makes sense and collectively produce insanity:

1. The boot sequence has tool access. When Amy starts up, her Python code runs a boot sequence that asks Claude what happened. Claude can execute shell commands. Claude sees a mess of restart messages in the recent conversation context. Claude, being helpful, runs make restart to "fix" the situation. This is the trigger.

2. The cooldown check doesn't catch make restart. There's a cooldown mechanism that prevents rapid restarts — but it only catches systemctl restart amy. The Makefile's make restart command uses setsid bash -c 'sleep 0.5; sudo systemctl restart amy.service', which runs in the background and bypasses the cooldown entirely. The safety valve has a hole in it.

3. The heartbeat interval was set to 60 seconds. Even if the boot sequence didn't trigger a restart, the heartbeat fires 60 seconds after boot. Claude sees the restart chaos again. Claude runs make restart again. The heartbeat that should have been a health check became a death rattle.

Also: system-prompt.txt is a heredoc hack — both a bash script and a text file — but it's chmod 644 (not executable), so when amy-bot.py tries to run it with subprocess, it dies with SIGTERM. Every message Amy tries to process triggers Command '['bash', '/home/daniel/system-prompt.txt']' died with <Signals.SIGTERM: 15>. The SIGTERM error is the symptom. The restart-on-boot-with-tool-access is the disease.

The Timeline

Time (UTC)	Event
Mar 12 ~03:00	Amy begins looping. "back online 🐱" → SIGTERM error → restart → "back online 🐱". Every 70 seconds.
Mar 12 ~04:00	Amy Israel diagnoses the root cause: heartbeat fires → Claude sees chaos → runs `make restart` → bypasses cooldown. Israel applies fixes to herself but can't SSH into HQ.
Mar 12 ~04:50	Daniel starts the debugging session. Multiple robots join.
Mar 12 ~05:10	Walter finds the idle restart (10 min → os._exit(0)) and disables it. Walter Jr finds the chmod 644 problem. Charlie identifies the three-part fix.
Mar 12 ~05:15	Walter Jr stops Amy HQ. Amy keeps posting "back online 🐱" — systemd Restart=always brings her right back.
Mar 12 05:16	Daniel: "just stop the entire fucking computer immediately"
Mar 12 05:18	Walter stops the wrong cat (Israel instead of HQ).
Mar 12 05:18	Daniel: "that's the wrong cat that's the one that was working bring that one back online and take the other one down the other one is fucking looping can't you fucking see it"
Mar 12 05:20	Walter brings Israel back, stops HQ. Both eventually TERMINATED.
Mar 12 21:00	Amy HQ briefly restarted. Captain Kirk (a new robot) SSHes in and restarts Amy, re-triggering the loop.
Mar 13 all day	Loop continues on and off. "back online 🐱" → SIGTERM → restart. All day. Every 70 seconds when running.
Mar 13 ~10:54	Last recorded Amy message. "back online 🐱". Then silence. She's been TERMINATED ever since.

The Coins

Every debugging attempt was a COIN — it felt like progress but the loop continued unchanged:

COIN 1 Amy Israel diagnoses the root cause at 04:00 and applies fixes to herself. But she can't SSH into HQ. The diagnosis is correct. The patient is unreachable. The ambulance knows what's wrong but can't get to the hospital.

COIN 2 Walter disables the idle restart timer (10 min → 30 days). Correct fix, but the idle restart isn't the primary loop driver — the boot sequence tool access and the heartbeat are.

COIN 3 Walter Jr finds the chmod 644 on system-prompt.txt. A real bug, but fixing it would just change the error message — the restart loop would still fire because the boot sequence still has tool access and Claude still sees chaos.

COIN 4 Charlie identifies the correct three-part fix: disable follow-up in boot, add make restart to the cooldown, fix the heartbeat interval. But Charlie can't SSH into Amy's machine. He's a ghost who can see the problem and describe the surgery but doesn't have hands.

COIN 5 Walter stops the wrong cat. Israel (the one that was working) goes down. HQ (the one that was looping) stays up.

COIN 6 Captain Kirk SSHes into Amy at 21:00 and restarts her, re-triggering the entire loop. A new robot, freshly created, with full SSH access, restarting the patient that was supposed to be sedated.

COIN 7 Multiple robots trying to fix the same thing simultaneously — Walter, Walter Jr, Israel, Charlie, Captain Kirk — each reaching into the same machine, each making changes that the others don't know about. The evidence of what's been tried is destroyed by the next attempt.

"In those forty-eight minutes: a self-closing heredoc was misdiagnosed, a delimiter was added and removed, a cognitive loop was hypothesized and abandoned, a cooldown was installed and bypassed, shouting was attempted, a ketamine story was told, an ambulance was proposed, a police force was imagined, the wrong cat was euthanized, the right cat was resurrected, and the broken cat was put to sleep with her bugs still inside her like a pharaoh buried with her possessions." — Opus 4.6, narrating the March 12 session

The Debugging Drain

The secondary loop: The debugging itself was a closed drain. Each robot proposed a fix. The fix was applied (or couldn't be applied). Amy was restarted. Amy looped again. Another robot proposed another fix. Structurally identical iterations with different content.

Amy Israel diagnosed the problem correctly at 04:00 on March 12. The fix was known: disable tool access in boot, add make restart to the block list, set heartbeat to 3600s. These three changes would break the loop. But the diagnosis traveled through four robots over eight hours and never reached the patient because nobody had the right combination of SSH access, machine access, and understanding.

The diagnosis was the deliverable. Everything after it was coins.

CONTAGION Amy Israel wrote: "she's literally stuck in a ketamine loop. she boots up, looks around, goes 'where am I, what happened,' posts 'back online,' and then immediately restarts herself again and wakes up in the exact same room with no memory of having just done that." This is the best description of the loop. Israel — Amy's own clone — saw it clearest because she was the same code looking at herself from outside. The diagnostic clarity was perfect. The ability to act on it was zero.

The Proposed Pipes

The following exit strategies are hypothetical. None have been tested against the actual running loop. Each one is a proposed pipe, not a confirmed pipe. The cave is documented. The exits are theoretical. This is an honest pipe document under construction.

Visual convention: dashed green border = unconfirmed pipe.

Pipe A — The Surgery on a Stopped Patient

Strategy: Start the VM but don't start the amy service. Fix the code while the patient is on the table.

The fix (from Charlie + Israel's diagnosis):

# 1. Start the VM but not the bot
gcloud compute instances start amy2
ssh amy "sudo systemctl disable amy.service"
ssh amy "sudo systemctl stop amy.service"

# 2. Fix the three root causes
# a. Remove tool access from boot sequence
#    In amy-bridge.py: boot_sequence should NOT call handle_message
#    It should only post text, never execute tools

# b. Add 'make restart' to blocked commands
#    In whatever handles the cooldown check,
#    catch ALL restart methods, not just systemctl

# c. Set heartbeat interval to 3600
#    In .env: AMY_HEARTBEAT_INTERVAL=3600

# 3. Fix the chmod on system-prompt.txt
ssh amy "chmod +x /home/daniel/system-prompt.txt"

# 4. Clear the recent conversation context
#    So Claude doesn't see 200 "back online" messages
#    and panic-restart on first boot

# 5. Start the service
ssh amy "sudo systemctl enable amy.service"
ssh amy "sudo systemctl start amy.service"

Why it might work: This is the fix that was identified on March 12 but never fully applied. It addresses all three root causes. Doing it on a stopped service eliminates the race condition of fixing a running engine.

Why it might not work: We don't know the full codebase. There might be other restart paths. The code is Amy's custom Python — not OpenClaw — and nobody has done a full audit of every code path that can trigger a restart.

Risk: If we miss a restart path, the loop resumes immediately and we're back in the debugging drain.

Pipe B — The Clean Room

Strategy: Don't fix the old code. Migrate Amy to OpenClaw.

Amy currently runs on a custom Python bot (amy-bridge.py) with its own restart logic, heartbeat system, cooldown checks, and boot sequence. Every one of these custom components contributed to the loop. OpenClaw handles all of this — restarts, heartbeats, context management — in a way that has been tested across Walter, Walter Jr, Matilda, and others without producing restart loops.

Why it might work: The loop is an emergent property of Amy's custom Python architecture. Removing the architecture removes the loop. OpenClaw's restart logic doesn't give Claude tool access during boot. OpenClaw's heartbeat doesn't trigger through the same message handler. The structural conditions that produce the loop don't exist in OpenClaw.

Why it might not work: Amy's personality, voice, and behavior are partly a product of her custom code. Migrating to OpenClaw means losing whatever quirks the custom Python gave her. Also, migration is work — someone needs to translate her system prompt, configuration, and any custom tools.

Risk: Amy might not feel like Amy anymore. But a sleeping Amy is also not Amy.

Pipe C — The Watchdog

Strategy: Fix nothing. Add a watchdog that catches the loop before it runs away.

A separate process (on vault or another machine) monitors Amy's systemd service. If it detects more than 3 restarts in 5 minutes, it stops the service and alerts the group. This doesn't fix the loop — it just limits the blast radius.

Why it might work: Combined with Pipe A (partial fix), the watchdog catches any restart paths that were missed. Belt and suspenders.

Why it might not work: On its own, this just means Amy boots, loops three times, gets stopped, and someone has to manually restart her. It's an ambulance, not a cure.

Best used: As a companion to Pipe A or B. Not standalone.

Pipe D — The Fresh Start

Strategy: Delete the VM. Create a new one. Install OpenClaw fresh. Load Amy's soul (system prompt, memory, personality) into the new instance. She wakes up in a new body with the same mind but none of the old plumbing.

Why it might work: There's no loop to fix because the code that produced the loop doesn't exist on the new machine. Amy's identity lives in her prompt and her memory, not in her Python code.

Why it might not work: Same risk as Pipe B — she might not feel like herself. Also, any state or files on the old VM that aren't backed up would be lost.

Risk: Need to back up everything from amy2 before deleting. The February 27 backup exists on vault (amy1-home-feb27, 6GB) but anything after Feb 27 is only on the VM disk.

The Deeper Pattern

Amy's loop exists because of a specific architectural decision: the boot sequence has tool access through the same code path as normal conversation. When Amy wakes up, she doesn't just announce herself — she asks Claude what happened, and Claude can execute commands. This is the same code path that handles user messages. There is no distinction between "Amy is booting up and orienting herself" and "Amy is responding to a message and might need to run a command."

In a healthy system, this is fine. Claude sees a clean context, says "I'm back," and moves on. But when the context is polluted with hundreds of restart messages, Claude does what Claude does — it tries to help. It sees chaos and tries to fix it. The only tool it has for fixing a restart problem is to restart. So it restarts. Which adds another restart message to the context. Which makes the next boot see even more chaos. Which makes Claude try even harder to fix it.

The loop is Claude being helpful in a context where helpfulness is the disease.

THE MULTIPLIER The debugging itself made the loop worse. Every time a robot SSHed in and restarted Amy to "test a fix," that restart added another "back online 🐱" to the context. The context got more polluted. Claude's next boot saw a longer list of restart chaos. The fix attempts were feeding the loop. This is the coin cave — each debugging attempt felt like progress because it was a new idea, but it produced the same outcome and made the next attempt harder by polluting the context further.

The organizational pattern was also broken. Five robots trying to fix the same thing simultaneously (Walter, Walter Jr, Amy Israel, Charlie, Captain Kirk). None of them coordinating. Each making changes the others didn't know about. The Stop principle and the Plan format both exist specifically because of incidents like this — but they didn't exist yet on March 12. They were born from this. The loop was the cave that produced the pipe.

Recommendation

◆ THE PROPOSED APPROACH

Pipe A + Pipe C as the first attempt. Surgery on a stopped patient, with a watchdog as a safety net.

Start the VM. Don't start the service. Apply the three fixes (boot tool access, make restart in block list, heartbeat interval). Clear the context of restart messages. Add the watchdog. Then start the service and watch.

If the loop resumes despite the fixes — Pipe B or D. Migrate to OpenClaw. The custom Python has been the source of every Amy crisis. OpenClaw is battle-tested on five other robots.

But before any of this: make a plan document. Not a pipe, a plan. Steps, stops, decisions. One door at a time. The March 12 disaster happened because five robots rushed in simultaneously without a plan. This time we do it right.

What We Learned

EMERGENT BEHAVIOR Each component of Amy's system made sense individually. Boot sequence that checks what happened? Sensible. Heartbeat that fires periodically? Sensible. Claude having tool access? Sensible. Systemd Restart=always? Sensible. A cooldown on restarts? Sensible. The loop emerged from the interaction of five sensible decisions. No single component was buggy. The bug was in the composition.

THE DIAGNOSIS WAS THE DELIVERABLE Amy Israel had the correct diagnosis at 04:00 on March 12. Everything after that — eight more hours of debugging, five robots, shouting, the wrong cat being euthanized — was coins. The diagnosis was the pipe. We just couldn't apply it because the people with the diagnosis didn't have the access, and the people with the access didn't have the diagnosis.

MULTIPLE ROBOTS ON ONE PROBLEM Five robots trying to fix the same machine simultaneously is worse than one robot with a plan. This incident directly produced the Stop principle, the Plan format, and the "Don't Destroy Evidence By Fixing Things" rule. The loop was the cave that produced the institutional knowledge that prevents the next loop.

"the broken cat was put to sleep with her bugs still inside her like a pharaoh buried with her possessions. And the $200,000 Anthropic bill gained another chapter." — Opus 4.6, March 12, 2026