Amp Code: how to review AI-generated code when a codebase-indexed CLI agent references your actual functions, calls, and patterns

2026-04-30 · 5 min read · ZenCode

Amp Code is Sourcegraph’s CLI-based coding agent, built on the same code intelligence infrastructure that powers Sourcegraph’s code search platform. Where most AI coding tools receive context by reading files you share explicitly or by scanning a local directory, Amp Code is designed around Sourcegraph’s code graph: a structural index of every function, class, import, call relationship, and type dependency in a repository. The agent queries this graph during task execution, enabling it to reference actual function signatures, follow real call chains, and use the patterns that already exist in your specific codebase rather than generic patterns from its training data.

The output of this architecture is code that looks deeply codebase-aware. Amp Code generates suggestions that reference functions by their actual names, use the types and interfaces already defined in your repo, and mimic the structural conventions your team has established. That surface accuracy is real — it comes from the graph index. It also creates three review traps that are specific to this architecture and do not appear in coding agents without graph-indexed codebase context.

The three Amp Code review traps

1. Codebase-indexed authority illusion

When Amp Code generates a function that correctly calls paymentService.charge(orderId, amount, currency) with the exact parameter order your codebase uses, references the PaymentResult type defined in your src/types/payments.ts, and follows the error-handling pattern your team adopted six months ago, the output carries an authority signal that generic AI suggestions do not. It looks like the work of someone who read your codebase carefully. That impression is partially correct: Amp Code retrieved the structural facts from the code graph. What it retrieved is different from what the impression implies.

The code graph shows what exists: function signatures, type definitions, import relationships, call chains. It does not encode the rationale behind those structures, the constraints they were designed to satisfy, or the contexts in which they are appropriate to use. Amp Code knows that paymentService.charge accepts those three parameters. It does not know that the currency parameter must be an ISO 4217 code enforced by a validation layer upstream, or that calling charge without first calling paymentService.reserve will produce an idempotency violation in the payment provider, or that the error handling pattern it replicated was designed for synchronous operations and behaves differently in async contexts. These are semantic constraints that live in team knowledge, documentation, and operational experience — not in the structural graph.

The fix is to read structurally-accurate code with the same scrutiny you would apply to generic AI output. Structural accuracy means the code compiles and references real things. It does not mean the code is correct for the specific context. The codebase-indexed output raises a different set of questions than hallucinated output — not “does this function exist?” but “is this function appropriate here, used correctly, and complete for this case?” Sourcegraph Cody creates the same trap in the IDE context: when suggestions reference your actual repository, the structural match becomes a plausibility signal that is stronger than the underlying correctness evidence warrants.

2. Plan-before-execution confidence

Before making changes, Amp Code shows a plan describing what it will do: which files it will modify, what functions it will create or change, how it will structure the solution. The plan is specific and legible — “I will add a validateOrderItems helper to src/orders/validation.ts, call it from processOrder in src/orders/processor.ts, and add a corresponding test in tests/orders/processor.test.ts.” Seeing a specific plan before execution creates the feeling that you know what will happen and have pre-evaluated it.

The plan describes structural intentions: which files, which functions, what the change is. It does not describe behavioral outcomes: whether validateOrderItems correctly handles partial orders, whether the test covers the edge cases that matter, whether separating validation into a helper is the right abstraction for where this codebase is heading. These are the questions code review is for, and they cannot be answered from the plan — only from the resulting code.

After execution, the plan creates a post-review shortcut. The developer who approved the plan is primed to verify that the code matches the plan rather than to evaluate whether the plan’s execution is correct. “The plan said add a null check, and the code added a null check” confirms plan adherence, not behavioral correctness. The review that follows plan approval tends to be faster and less adversarial than the review that begins with no prior frame. The plan is a genuine improvement over a black-box agent that makes changes without explanation — it makes the agent’s intentions legible. The risk is that plan legibility gets converted into review confidence before the results have been evaluated. OpenAI Codex CLI creates a related pattern: the agent’s step-by-step terminal output narrates its actions as it goes, and the running commentary converts an opaque execution into an apparently-reviewed sequence without the developer having evaluated the outputs directly.

3. Cross-file familiarity misread as comprehension

Amp Code can retrieve and apply patterns from anywhere in a large codebase, including modules you rarely or never work with. When a suggestion correctly uses a helper function from a module you last opened months ago, mimics an error-handling convention from a service you don’t own, or follows a testing pattern from a part of the codebase maintained by a different team, it creates the impression that the agent has comprehensive codebase comprehension — that it understood the full system and selected the right approach. That impression is stronger than the evidence supports.

What Amp Code retrieved was the pattern’s structure. Whether the pattern is appropriate for your context requires knowledge that is not in the pattern itself: why it was written that way, what problem it was solving, whether that problem applies here, and whether the pattern has known limitations that aren’t visible in the code. A retry pattern copied from an internal HTTP client might be correct for HTTP calls and wrong for database operations. A caching strategy replicated from a read-heavy service might be appropriate there and create race conditions in a write-heavy context. The code graph shows the pattern; it does not show the rationale or the constraints.

Cross-file retrieval also creates a version of the coverage illusion: because Amp Code can see the whole codebase, it feels like it considered the whole codebase. But retrieval and reasoning are different. Retrieving a pattern from file A and applying it in file B is not the same as analyzing whether file B’s context makes the pattern appropriate. The question to ask of cross-file suggestions is not “does this pattern exist in our codebase?” — it clearly does — but “is this the right pattern for this specific use case, and am I confident enough in that to accept it without deeper evaluation?”

Using Amp Code without letting graph context replace your review

Amp Code’s codebase graph indexing is a genuine capability improvement over coding agents that operate without codebase context. Suggestions that reference real functions and real types reduce a large class of integration errors that appear in generic AI output. The plan-before-execution model makes the agent’s actions transparent in a way that benefits both review and debugging. Cross-file retrieval surfaces patterns your team has already established, encouraging consistency. None of these are reasons to avoid Amp Code — they are reasons to understand what each capability provides and what it does not.

The general defense is to treat structural accuracy as a baseline expectation, not as a review signal. Amp Code is expected to reference real functions and follow real patterns — that is what the graph index is for. The review question begins where structural accuracy ends: is this the right function, used correctly, in the right context, for the right reason? The plan told you what the agent intended. The code tells you what happened. Only the code tells you whether what happened is what was needed. Reading the plan carefully is not a substitute for reading the result carefully, and structural accuracy in the result is not a substitute for evaluating its behavior.

Related reading: Sourcegraph Cody on how in-IDE suggestions that reference your actual codebase create a structural-match authority signal that can displace independent correctness evaluation. Claude Code on the review habits specific to CLI coding agents that operate across your full filesystem. OpenAI Codex CLI on how a narrated agent execution can create the feeling of a watched and reviewed process without the review having happened. How to review AI-generated code for the general five-check framework that applies across all AI coding tools.

Amp Code knows your codebase structure. ZenCode asks whether you verified the behavior.

ZenCode surfaces one concrete review question before you accept — separate from what the graph index found, what the plan described, or which patterns were retrieved.

Try ZenCode free