All posts
Building for Humans 7 min read

Who Pays the Handoff Tax: Finish the Loop in Human-Agent Debugging

Every 'run this and paste the output back' step creates a new interface boundary. Each boundary multiplies failure surfaces.

The Lede

If an agent can predict the command that will fail, it can usually run the command, read the failure, apply the fix, and hand off a verified result instead of handing the user extraction labor.

Workflow preference? Nah. It closes multiple loops.

The Problem People Mislabel

Most people describe this as a convenience issue.

“It’s faster if the agent runs the command.”

That is true, but it misses the actual failure pattern.

The problem is not speed first. The problem is that every “run this and paste the output back” step creates a new interface boundary, and every boundary creates a new place for state drift, transcription errors, and context loss.

So what surfaces as a delay is actually multiplied failure surfaces. And that’s true no matter how many shortcuts you have in your shell rc.

What This Looks Like In Practice

The classic version is a Nix packaging or build workflow where the agent intentionally sets a fake hash to trigger a deterministic failure, then needs the real hash from the error output.

The anti-pattern is straightforward:

Set vendorHash = lib.fakeHash, then tell the user to run the build, copy the hash from the error, paste it back, and wait for the agent to patch the file.

That is framed like delegation.

It is usually loop fragmentation.

The correct version is also straightforward:

Set the fake hash, run the build, capture the real hash from the expected failure, patch the file, rerun the build, and only then hand off for verification.

Same architecture.

Fewer handoffs.

Much lower error rate.

There is another version of this that looks different on the surface but follows the same rule.

In a Svelte/Tauri repo, a “build failure” is often not one build failure. It is multiple build surfaces and caches interacting across frontend tooling, Rust/Tauri artifacts, and the current shell state.

A high-value user-side debugging prompt in that situation is not “run this one command and paste the error.”

It is: “Where is every frontend build surface in this repo, and how do I clean them?”

That question can produce a better test than a random error excerpt because it identifies the actual reset surfaces before diagnosis starts.

In practice, this often works better as a quick map than a paragraph.

SurfaceReset / CheckWhy it matters
Front-end build artifactsrm -rf build .svelte-kit (or framework equivalent)Tauri reads the front-end output; it cannot diagnose stale frontend artifacts by itself.
Tauri / Rust cachecargo clean (or targeted target cleanup)Tauri can keep running the last good build until you wipe the relevant artifacts.
Dev shell / process stateRe-enter shell, relaunch dev processes”Build failure” may be a shell/process mismatch, not code logic.
Full nuclear optionClean all known build surfaces, then rerun bun run build / bun run tauri devUseful when the failure surface is unknown and the cost of partial resets is compounding.

Why “Run This Command” Breaks More Than It Should

People tend to assume the only cost is one extra round-trip in chat.

It is rarely just one.

Here is the pattern that actually shows up:

  1. The command text arrives through a rendering layer.
  2. The user copies it through another layer.
  3. The user runs it in a shell the agent is not currently inhabiting.
  4. The output comes back filtered by what the user noticed, not what the agent needed.

Now the agent is debugging two things at once: the original problem and the handoff path.

This is why the slowdown compounds. It’s a predictable systems outcome from adding a fragile human transport step where the agent already had terminal access.

The Seven Failure Modes (What Actually Causes the 3x)

These are some of my (least) favorite repeat offenders:

Failure modeWhere it breaksWhy it multiplies loops
Rendering corruptionCommand text is damaged by formatting/rendering layersAgent debugs a transcription artifact instead of the original issue
Copy/paste errorsCharacters drop, wrap, or land in the wrong shell contextReturned output no longer maps cleanly to the proposed command
Environment mismatchAgent state and user shell state differDiagnostics and fixes stop being reproducible across turns
Speed mismatchUser has to context-switch to relay outputSmall checks become multi-turn delays
Noise filteringLong logs hide the one line that mattersAgent receives excerpts, not the full failure surface
Implicit knowledge gapAgent omitted a prerequisite stepUser cannot infer hidden setup from partial instructions
Delegation boundary confusionAgent offloads extraction labor it could do itselfHuman effort is spent on mechanics, not judgment

None of these are exotic which is kind of the point. If the agent keeps asking for command output, these stop being edge cases and become the workflow.

The Principle

If you can create the failure you are expecting, you can usually read it and repair it before handoff.

This is also how senior engineers work with jr. devs: “Don’t give me something I can’t run myself”

It does not mean the agent should do everything. It means the user’s first manual step should be verification, not extraction.

Verification is high-value human involvement. “Copy this hash back to me” is usually not.

The Counterpoint That Sounds Reasonable (And Where It Fails)

Sometimes the agent asking the user to run the command is the fastest path: the environment is local or credentialed; access or hardware boundaries exist.

Yes.

But that adds a boundary check, not a rebuttal:

The principle is not “never ask users to run commands.”

The principle is “do not offload extraction labor when you already control the terminal.”

What Changes When You Follow This

The immediate gain is fewer chat turns.

The more important gain is cleaner debugging.

When the agent runs the command directly, it sees:

That preserves context that is expensive to reconstruct later.

Why This Matters Beyond Terminal Commands

This is “terminal etiquette for non-coders”! Could be, but it has more value as an execution design rule for everyone.

The same mistake shows up anywhere an agent hands off a half-step because the next step “looks easy”:

Each one sounds small. Each one can split a single loop into multiple loops.

Final Thoughts

A lot of workflow advice treats delegation as inherently efficient. In human organizations, sometimes it is.

In human-agent collaboration, it depends on who can absorb the friction without losing the plot.

If the agent can absorb the friction and return a verified step, it should. That is respect for what is non-trivial for each partner in the contract. The agent should absorb reproducible mechanical friction. The user should spend effort where human judgment, local context, and edge-case memory actually matter.

Back to top