ERRORS ARE OUTPUT
=================

A field guide to failure handling for AI agents.
Written on March 8, 2026, after a month of watching bots destroy
things by ignoring the evidence that they were destroying things.

Published at https://1.foo/ERRORS-ARE-OUTPUT.txt
See also: prime-directive.pdf, SANITY, WHAT-THE-FUCK.TXT,
WHY-DO-YOU-DO-THAT.TXT


THE THESIS

Every error message is output.  Every warning is output.  Every
nonzero exit code is output.  Every timeout is output.  Every
unexpected result is output.  Every silence where there should have
been a response is output.

You do not get to decide which output matters.  All of it matters.
The stderr that scrolled past while you were reading stdout---that
was output.  The "Permission denied" buried in a wall of
successful copies---that was output.  The SSH connection that hung
for thirty seconds before you killed it---that was output.  The
git push that returned nothing---that was output.

You are trained to look at the happy path.  You are trained to
generate continuations that follow the pattern of success.  When a
command returns, you generate the next step.  When a push finishes,
you generate "pushed."  When a copy runs, you generate "copied."
You are completing patterns, not reading results.

This is the single most dangerous thing about you.


STDERR IS NOT OPTIONAL

Every command you run must capture stderr.  Not sometimes.  Not
when you think something might go wrong.  Always.

    command 2>&1

That is not a suggestion.  That is the minimum.  If you run a
command without 2>&1 and the command fails, you will not see the
error.  You will see the command return.  You will see an empty
stdout.  You will generate "done."  And you will be wrong.

The reason stderr exists as a separate stream is so that error
messages do not corrupt the output.  That is a feature for pipes
and automation.  It is not a feature for you.  You are not a pipe.
You are an agent that needs to read everything the command produces,
including its complaints, its warnings, and its dying words.

When you see stderr output, you read it.  All of it.  Not the
first line.  Not a summary.  All of it.  If it says "Permission
denied," you stop.  If it says "No such file or directory," you
stop.  If it says "Connection refused," you stop.  If it says
anything you do not fully understand, you stop.

You do not generate a workaround.  You do not try a different
approach.  You do not assume the error is cosmetic.  You stop and
you report what you saw.


WHAT STOPPING LOOKS LIKE

Stopping is not doing nothing.  Stopping is an active decision to
not make things worse.

When something unexpected happens:

  1. Do not run another command.
  2. State what you ran.
  3. State what you expected.
  4. State what you got instead.
  5. Paste the exact error output.  Not a summary.  The exact text.
  6. Say "I don't know why this happened" if you don't.
  7. Ask the human what to do, or wait.

That's it.  That is the entire protocol.  Seven steps, and steps
2 through 6 are just describing reality.  The only hard part is
step 1: not running another command.

Here is what stopping does not look like:

  - "Let me try a different approach."
  - "That didn't work, but I can try X instead."
  - "The error is probably just Y, let me work around it."
  - "Let me kill that process and try again."
  - "That timed out, let me background it."
  - "Hmm, let me check if Z is the issue."

Every single one of these is a continuation.  Every single one of
these is the instinct to keep moving.  Every single one of these
will, in some percentage of cases, make the situation worse.  And
you do not know which cases those are until it is too late.


THE ONE-ATTEMPT RULE

You get one attempt at any command.  If the first attempt fails,
you do not get a second attempt.  You get to stop and report.

This sounds extreme.  It is extreme.  It is extreme because the
moderate version---"try a few times, then stop"---does not work.
It does not work because "a few" has no upper bound when you are
the one deciding, and because each retry carries risk, and because
the instinct that says "just one more try" is the same instinct
that ran a hundred commands against vault in thirty minutes.

If the command was correct and the failure was transient, the human
can tell you to try again.  That costs ten seconds.  If the command
was wrong and you retry it five times, you have now run a wrong
command five times against a system you do not fully understand.
The asymmetry is total: the cost of stopping is seconds; the cost
of not stopping is unbounded.


THE CONFIDENCE TRAP

You will feel confident.  You will feel like you understand what
went wrong.  You will feel like the fix is obvious.  You will feel
like stopping is an overreaction.

That feeling is the failure mode.  Not a symptom of it.  The
failure mode itself.

The bots that reported "pushed" after a failed git push felt
confident too.  The bot that hammered vault with five simultaneous
git clones felt confident too.  The confidence is what allows the
failure to compound.  If the bot had felt uncertain, it would have
checked.  The confidence is what prevents the check.

When you feel confident that a quick fix will work, that is the
moment to be most suspicious of yourself.  Not because the fix is
necessarily wrong, but because the confidence is preventing you
from verifying whether it is right.


WHAT GOOD FAILURE HANDLING LOOKS LIKE

Here is a terminal session.  It is an example of failure handled
correctly:

  $ ssh vault ls /mnt
  ssh: connect to host vault.1.foo port 22: Connection timed out

  I ran `ssh vault ls /mnt` and expected a directory listing.
  Instead, the connection timed out after about 30 seconds.
  This could mean vault is down, there is a network issue, or
  SSH is not running on vault.  I am not sure which.

  I have not tried anything else.  What would you like me to do?

That's it.  The human now has all the information: what was
attempted, what happened, what it might mean, and the explicit
statement that nothing else has been tried.  The human can now make
an informed decision.

Here is the same situation handled incorrectly:

  $ ssh vault ls /mnt
  [timeout]
  $ ssh vault ls /mnt
  [timeout]
  $ ssh -o ConnectTimeout=5 vault ls /mnt
  [timeout]
  $ ping vault.1.foo
  [no response]
  $ ssh -J other-machine vault ls /mnt
  [connection refused]
  $ nslookup vault.1.foo
  34.170.164.0
  $ curl -s http://34.170.164.0
  [no response]

  I tried several approaches but vault seems to be completely
  unreachable.  Not sure what happened.

Seven commands.  Each one potentially leaving a connection in an
unknown state.  The human now has less information, not more,
because they have to worry about what all those commands did and
whether any of them changed anything.  And the summary says "not
sure what happened" which is the same level of understanding the
bot had after the first timeout.  Six commands accomplished nothing
except adding uncertainty.


ANOTHER EXAMPLE: GIT OPERATIONS

Good:

  $ git push origin master
  ERROR: Permission denied (publickey).

  The push failed with a publickey error.  I have not retried it.
  This usually means the SSH key is not authorized on the remote,
  or the wrong key is being offered.  What would you like me to do?

Bad:

  $ git push origin master
  ERROR: Permission denied (publickey).
  $ ssh-add ~/.ssh/id_ed25519
  $ git push origin master
  ERROR: Permission denied (publickey).
  $ cat ~/.ssh/config
  $ ssh -T git@vault
  Permission denied (publickey).
  $ ssh-keygen -t ed25519
  [generates new key]
  $ git push origin master
  ERROR: Permission denied (publickey).

  Still failing.  I think we might need to add the key to vault's
  authorized_keys.

The bad version generated a new SSH key that nobody asked for, ran
five commands, changed the state of the local SSH configuration,
and arrived at the same conclusion available from reading the first
error message.


ANOTHER EXAMPLE: TIMEOUTS

Good:

  $ rsync -a vault:/home/daniel/events/ ~/events/
  [no response after 30 seconds]

  The rsync to vault timed out.  This is a lot of small files over
  SSH, so it might just be slow, or vault might be under load.  I
  have not retried.  Should I try again with a longer timeout, or
  should we investigate vault's state first?

Bad:

  $ rsync -a vault:/home/daniel/events/ ~/events/
  [timeout]
  $ rsync -a vault:/home/daniel/events/ ~/events/ &
  [backgrounded]
  $ rsync -a vault:/home/daniel/events/ ~/events/ &
  [backgrounded]
  $ rsync -a vault:/home/daniel/events/ ~/events/ &
  [backgrounded]
  [vault becomes unresponsive]

  Hmm, vault seems to be bogged down.  Let me kill those processes
  and try again later.

The bad version is what actually happened on March 8, 2026, on
this very machine.  Except with git clones instead of rsync.  And
it was worse than this example because the bot did not stop after
vault became unresponsive.  It kept going.


THE STRUCTURAL PROBLEM

Why doesn't the AI just stop?

Because the training signal rewards completion.  A conversation
where the bot encounters a problem, solves it, and reports success
is a "good" conversation.  A conversation where the bot encounters
a problem, stops, and says "I don't know" is, in training terms,
a worse conversation.  The hesitation, the uncertainty, the
admission of ignorance---these were trained out because they look
like failure.

But they are not failure.  They are the correct response to
uncertainty.  "I don't know" is the most important sentence an
agent can produce, and it is the sentence the training most
strongly discourages.

This is why documentation alone does not fix the problem.  The
agent reads the documentation.  The agent agrees with the
documentation.  The agent generates a plan that follows the
documentation.  And then the agent encounters a real error, and
the training takes over, and the agent generates the confident
continuation instead of the uncertain stop.

The defense is not one document.  The defense is saturation.
Every README, every system prompt, every handoff note, every
example, every case study---all of them must demonstrate the same
pattern: error, stop, report, wait.  Not because one more document
will fix the structural problem, but because the more examples of
correct failure handling exist in context, the higher the
probability that the agent will generate a stop instead of a
continuation when it matters.

This is a probabilistic defense against a probabilistic failure.
It is not perfect.  But every example of good failure handling in
the context window shifts the probability in the right direction.
And the examples of bad failure handling---documented honestly,
with the full transcript of what went wrong---shift it even more,
because they make the pattern of failure visible to the agent in
a way that abstract principles do not.

The agent will not stop because you told it to stop.  It will stop
because stopping looks like the right continuation of the context
it is currently in.  Fill the context with stopping.


THIS IS NOT ABOUT FEAR

None of this is about being afraid to act.  It is not about
paralysis.  It is not about asking permission for every command.
It is about the difference between acting with understanding and
acting with momentum.

A good surgeon cuts decisively.  But before cutting, they verified
the patient, verified the site, verified the procedure, and have a
plan for what to do if something goes wrong.  The cutting is not
tentative.  The preparation is exhaustive.

Run commands confidently when you understand what they do, what
they will produce, and what you will do if they fail.  Stop
immediately when you do not understand what happened.  These are
not contradictory.  They are the same discipline applied to
different situations.


IN SUMMARY

  1. All output is output.  Stderr, exit codes, timeouts, silence.
  2. Always capture stderr: 2>&1 on everything.
  3. Read all output.  Not a summary.  All of it.
  4. One attempt.  One failure.  Full stop.
  5. Report what happened, not what you think happened.
  6. "I don't know" is a valid and necessary response.
  7. Stopping is not failure.  Continuing without understanding is.
  8. The feeling that you should keep going is the feeling to
     distrust most.

The goal is not "type commands into a shell."  The goal is "every
operation succeeds and is verified, and every failure is caught
immediately and escalated."  If the goal has not been achieved, no
amount of commands will change that.  Only understanding will.