GitHub Copilot CLI: how to review AI-suggested terminal commands before running them
GitHub Copilot CLI extends Copilot into the terminal through two commands: gh copilot suggest, which takes a natural language description and returns a shell command, and gh copilot explain, which takes a shell command and returns a natural language description of what it does. The workflow is the inverse of IDE Copilot: instead of receiving a suggestion while you type, you describe what you want and receive a command to run.
This sounds like a simple utility, but it introduces a set of review traps that are distinct from anything that appears in IDE-based AI coding tools. The reason is structural: shell commands and code have fundamentally different execution models, and that difference reshapes the review challenge in ways that IDE-focused review habits do not address.
The three traps
1. Execution-immediacy bypass
Code written in an IDE travels through a multi-stage validation pipeline before it affects anything real: you save, lint errors fire, tests run, a PR review happens, CI checks pass, and a deployment step moves the change to production. Each stage is a checkpoint. Each checkpoint is an opportunity to catch something before it causes damage. The pipeline exists precisely because no single stage catches everything.
Shell commands skip the entire pipeline. Pressing Enter is deploying. There is no lint pass that catches a wrong glob pattern. There is no test that verifies the command against your actual filesystem before it runs. There is no PR reviewer who notices that the xargs rm at the end of the chain is less targeted than you thought. The command runs immediately, against your real environment, with your real file permissions, as your current user. rm -rf, chmod 777, curl | bash — the execution is the deployment.
The attention trap is that gh copilot suggest presents the command in the same visual frame as any other terminal output: clean, complete, ready to use. The suggestion looks like the output of a tool you trust. It has the same visual weight as a git status result or a ls listing. Nothing in the presentation signals that this suggestion skips all four validation stages that code goes through before it acts. “Looks right” is doing the work of all four pipeline stages at once.
2. Pipe-chain opacity
Piped commands are how Unix composes behavior, and Copilot CLI generates them frequently because natural language requests naturally describe multi-step operations: “find all log files older than 7 days and delete them” becomes find . -name "*.log" -mtime +7 | xargs rm -f. Each stage looks readable. The chain as a whole reads as a logical sequence.
The opacity appears between the stages. Each | in a command removes visibility into the intermediate state. The grep in the middle of a chain produces a set of lines that the xargs at the end operates against — but you cannot see that intermediate set without stopping the command at that stage. When you review the full chain by reading it, you are simulating the intermediate states in working memory. That simulation is accurate for simple cases and unreliable for cases involving globs, environment variable expansion, file paths with spaces, symlinks, or unexpected matches.
The review trap compounds over pipe length. A two-stage command is easy to simulate accurately. A five-stage command with globs, exclusions, and substitutions requires holding five intermediate states simultaneously, each of which could diverge from your mental simulation. By the time you reach the destructive operation at the end — which is almost always at the end, because the final stage is the action and everything before it is targeting — your mental simulation has accumulated error from every prior stage. The command looks coherent at the level of reading; it may not match what will happen at the level of execution.
3. Explain-as-validation
gh copilot explain produces a natural-language description of what a command does. It is fluent, well-structured, and covers the main semantics accurately in the majority of cases. Reading a clear explanation of a command creates a strong “understood” feeling — the same feeling that reading a good code comment creates. That feeling is the trap.
The explanation describes the command’s intended semantics in the abstract. It does not simulate the command against your specific filesystem, your specific environment, your specific file permissions, or the specific values of any environment variables the command references. If a glob pattern matches more files than you expect on your system, the explanation does not surface this. If an environment variable is unset, the explanation describes what would happen if it were set. If a directory path does not exist, the explanation assumes it does. The explanation is an accurate description of what the command means. It is not a prediction of what the command will do on your machine.
The trust transfer is stronger than it appears because gh copilot explain is often used after gh copilot suggest as a verification step. The workflow becomes: suggest → read the explanation → run. The explanation feels like a review step between suggestion and execution. It occupies the position in the workflow where a review step should be. But “I understand what this command is supposed to do” and “I have verified that this command will do what I want on my system right now” are different claims. Explain answers the first question. You have to answer the second one yourself.
Three fixes
Dry-run destructive operations before running them. Before any command that modifies files, kills processes, changes permissions, or acts on network resources: simulate the destructive stage first. The simplest technique is to replace the destructive final stage with echo or head -20 to see what the preceding stages actually produce. find . -name "*.log" -mtime +7 | head -20 shows you the real set of files before xargs rm -f acts on it. For commands that support it, --dry-run is built-in simulation. For commands that don’t, run the pipeline up to the final stage, inspect the output, and only then re-run with the destructive action. This takes twenty extra seconds and it is the single highest-value habit for terminal AI tools because the cost of a wrong deletion is asymmetric: twenty seconds of dry-run versus hours or days of recovery.
Read the last stage of any pipe chain first. When reviewing a piped command, scan right-to-left instead of left-to-right. Find the last | and read what comes after it before reading the rest. The final stage is the action — the thing that will change your system. The earlier stages are targeting — they determine what the action applies to. Reading the action first tells you what kind of scrutiny the targeting deserves. If the last stage is head -5 or wc -l, the command is read-only and the targeting matters less. If the last stage is xargs rm -f, sed -i, chmod, or tee in write mode, the targeting is the entire security surface. Reading the action last — at the end of a left-to-right pass — is the worst possible order because by that point your working memory is full of “this makes sense” from the earlier stages, and you have almost no evaluation capacity left for the stage that matters most.
Use explain to generate one question, not to confirm the answer. After running gh copilot explain, name one specific thing the explanation did not answer about your environment. The explanation tells you what the command is designed to do. The question you need to answer is what it will actually do against your specific situation. Useful questions: “what does this glob expand to in my current directory?” (run ls or find without the action to check), “what happens if this env var is unset?” (run echo $VAR to check), “is this command idempotent?” (matters if you might run it twice). Name the question before running the command, then answer it. This takes the explanation from a validation artifact — which it is not, but feels like one — to a question-generator, which is its actual value.
How this differs from IDE Copilot
The GitHub Copilot inline autocomplete review challenge is primarily an attention-and-reflex problem: fast suggestions arrive before evaluation, and Tab becomes a reflex instead of a decision. The Copilot Chat review challenge is primarily an explanation-trust problem: the fluent explanation makes code feel understood before it is verified. The Copilot Workspace review challenge is a plan-approval problem: approving the spec creates a false “reviewed” signal before any code is read.
GitHub Copilot CLI combines elements of all three — fast accept path, explain-trust, and an approval interface that feels like a review step — but the decisive difference is the execution model. Code has a deployment pipeline. Commands do not. Every review habit built for IDE AI tools assumes that code will travel through multiple validation stages before it affects anything real. That assumption is false for terminal commands. The fixes above are designed specifically for the environment where the suggestion and the execution are separated by nothing but a keypress.
Warp Terminal AI creates a similar challenge but through a different interface: Warp integrates AI into the shell session itself, so the boundary between suggestion and execution is even thinner. The dry-run and last-stage-first habits apply equally to Warp AI suggestions — the execution model is the same even though the suggestion interface is different.
ZenCode — stay in review mode during AI generation gaps
A VS Code extension that surfaces a 10-second breathing pause during AI generation gaps — keeping you in active review mode instead of passive waiting mode when the output lands.
Get ZenCode free