The Lede
If an agent can predict the command that will fail, it can usually run the command, read the failure, apply the fix, and hand off a verified result instead of handing the user extraction labor.
Workflow preference? Nah. It closes multiple loops.
The Problem People Mislabel
Most people describe this as a convenience issue.
“It’s faster if the agent runs the command.”
That is true, but it misses the actual failure pattern.
The problem is not speed first. The problem is that every “run this and paste the output back” step creates a new interface boundary, and every boundary creates a new place for state drift, transcription errors, and context loss.
So what surfaces as a delay is actually multiplied failure surfaces. And that’s true no matter how many shortcuts you have in your shell rc.
What This Looks Like In Practice
The classic version is a Nix packaging or build workflow where the agent intentionally sets a fake hash to trigger a deterministic failure, then needs the real hash from the error output.
The anti-pattern is straightforward:
Set vendorHash = lib.fakeHash, then tell the user to run the build, copy the hash from the error, paste it back, and wait for the agent to patch the file.
That is framed like delegation.
It is usually loop fragmentation.
The correct version is also straightforward:
Set the fake hash, run the build, capture the real hash from the expected failure, patch the file, rerun the build, and only then hand off for verification.
Same architecture.
Fewer handoffs.
Much lower error rate.
There is another version of this that looks different on the surface but follows the same rule.
In a Svelte/Tauri repo, a “build failure” is often not one build failure. It is multiple build surfaces and caches interacting across frontend tooling, Rust/Tauri artifacts, and the current shell state.
A high-value user-side debugging prompt in that situation is not “run this one command and paste the error.”
It is: “Where is every frontend build surface in this repo, and how do I clean them?”
That question can produce a better test than a random error excerpt because it identifies the actual reset surfaces before diagnosis starts.
In practice, this often works better as a quick map than a paragraph.
| Surface | Reset / Check | Why it matters |
|---|---|---|
| Front-end build artifacts | rm -rf build .svelte-kit (or framework equivalent) | Tauri reads the front-end output; it cannot diagnose stale frontend artifacts by itself. |
| Tauri / Rust cache | cargo clean (or targeted target cleanup) | Tauri can keep running the last good build until you wipe the relevant artifacts. |
| Dev shell / process state | Re-enter shell, relaunch dev processes | ”Build failure” may be a shell/process mismatch, not code logic. |
| Full nuclear option | Clean all known build surfaces, then rerun bun run build / bun run tauri dev | Useful when the failure surface is unknown and the cost of partial resets is compounding. |
Why “Run This Command” Breaks More Than It Should
People tend to assume the only cost is one extra round-trip in chat.
It is rarely just one.
Here is the pattern that actually shows up:
- The command text arrives through a rendering layer.
- The user copies it through another layer.
- The user runs it in a shell the agent is not currently inhabiting.
- The output comes back filtered by what the user noticed, not what the agent needed.
Now the agent is debugging two things at once: the original problem and the handoff path.
This is why the slowdown compounds. It’s a predictable systems outcome from adding a fragile human transport step where the agent already had terminal access.
The Seven Failure Modes (What Actually Causes the 3x)
These are some of my (least) favorite repeat offenders:
| Failure mode | Where it breaks | Why it multiplies loops |
|---|---|---|
| Rendering corruption | Command text is damaged by formatting/rendering layers | Agent debugs a transcription artifact instead of the original issue |
| Copy/paste errors | Characters drop, wrap, or land in the wrong shell context | Returned output no longer maps cleanly to the proposed command |
| Environment mismatch | Agent state and user shell state differ | Diagnostics and fixes stop being reproducible across turns |
| Speed mismatch | User has to context-switch to relay output | Small checks become multi-turn delays |
| Noise filtering | Long logs hide the one line that matters | Agent receives excerpts, not the full failure surface |
| Implicit knowledge gap | Agent omitted a prerequisite step | User cannot infer hidden setup from partial instructions |
| Delegation boundary confusion | Agent offloads extraction labor it could do itself | Human effort is spent on mechanics, not judgment |
None of these are exotic which is kind of the point. If the agent keeps asking for command output, these stop being edge cases and become the workflow.
The Principle
If you can create the failure you are expecting, you can usually read it and repair it before handoff.
This is also how senior engineers work with jr. devs: “Don’t give me something I can’t run myself”
It does not mean the agent should do everything. It means the user’s first manual step should be verification, not extraction.
Verification is high-value human involvement. “Copy this hash back to me” is usually not.
The Counterpoint That Sounds Reasonable (And Where It Fails)
Sometimes the agent asking the user to run the command is the fastest path: the environment is local or credentialed; access or hardware boundaries exist.
Yes.
But that adds a boundary check, not a rebuttal:
-
Is the agent hard-blocked? If no, keep going.
-
Is the human hard-blocked? Offer ideas, tests, or a scoped diagnostic request.
The principle is not “never ask users to run commands.”
The principle is “do not offload extraction labor when you already control the terminal.”
What Changes When You Follow This
The immediate gain is fewer chat turns.
The more important gain is cleaner debugging.
When the agent runs the command directly, it sees:
-
the exact command issued
-
the exact shell state
-
the full output
-
the timing and ordering of failures
-
the follow-up repair in the same loop
That preserves context that is expensive to reconstruct later.
Why This Matters Beyond Terminal Commands
This is “terminal etiquette for non-coders”! Could be, but it has more value as an execution design rule for everyone.
The same mistake shows up anywhere an agent hands off a half-step because the next step “looks easy”:
-
“click this and tell me what it says”
-
“copy the error text and paste it back”
-
“run the build and send the hash”
-
“try this and report what happens” (when the agent could have tested it itself)
Each one sounds small. Each one can split a single loop into multiple loops.
Final Thoughts
A lot of workflow advice treats delegation as inherently efficient. In human organizations, sometimes it is.
In human-agent collaboration, it depends on who can absorb the friction without losing the plot.
If the agent can absorb the friction and return a verified step, it should. That is respect for what is non-trivial for each partner in the contract. The agent should absorb reproducible mechanical friction. The user should spend effort where human judgment, local context, and edge-case memory actually matter.