How to review AI-generated code: a practical checklist

2026-04-27 · 5 min read · ZenCode

The review habits that work for human-written code do not work for AI-generated code. When a colleague writes a function, your review starts from a baseline of “probably correct, check the edge cases.” When an AI writes the same function, the correct baseline is “unknown correctness, check everything” — but that is not how most developers approach it in practice.

What actually happens: the AI produces something that compiles, the IDE shows no errors, and a quick scroll feels like a review. The code ships. Three days later an edge case fires in production: a null pointer, a missing auth check, a library that was hallucinated and replaced by a similarly-named package that does something subtly different.

The problem is not that AI code is bad. It is that AI code bypasses the slow-reading instinct that manual review normally triggers. When a colleague writes messy code, you slow down. When an AI writes clean-looking code with consistent naming and zero linting warnings, your guard goes down at exactly the moment it should go up.

These five checks address that gap. They are ordered by where AI code most commonly fails and by how fast each check runs.

Check 1: Read the imports block before reading any logic

The first 10–15 seconds of reviewing AI-generated code should go to the imports section, not the function body. AI models hallucinate package names, use deprecated library versions, mix incompatible library generations (e.g. AWS SDK v2 and v3 in the same file), and import libraries that exist but do not have the specific method being called.

What to look for: any import you do not recognize, any version qualifier that differs from your lockfile, any library that has two competing packages in your ecosystem (the old REST-based client and the new gRPC-based one, for example). If you cannot verify an import in your own mental model in 5 seconds, look it up. The time cost of one package check is lower than the time cost of debugging a runtime crash from a missing method.

This check is especially important when using tools trained heavily on a specific cloud provider’s SDK patterns — the familiarity of AWS, GCP, or Azure boilerplate is exactly what makes a wrong import easy to miss.

Check 2: Find the error path before reading the happy path

Read the code backwards from what happens when something goes wrong. What happens when the input is null or empty? What happens when the API call returns a 500? What happens when the database returns zero rows instead of one?

If you cannot find the error handling in 15 seconds, it is not there. AI code frequently generates happy-path logic with plausible-looking structure that simply drops errors, returns undefined silently, or crashes with an unhelpful message. The happy path always looks correct. The error path is where the gap is.

The concrete check: before accepting any function, scan for how it handles the null case, the empty collection case, and the external-call-failure case. Three specific concrete checks take less time than a general review and catch the most common failure modes.

Check 3: Name one concrete edge case before accepting

This check is behavioral rather than technical. Before pressing Tab or clicking Accept, say to yourself (or type into a comment) one specific thing that could go wrong with this code. Not “looks good” but “what happens if the user passes an empty string here?” or “does this handle the case where the list has one element?”

The reason this works: it converts passive reading into active evaluation. Reading is not reviewing. Reading lets the surface fluency of AI-generated code trigger a “looks right” response before you have actually evaluated correctness. Requiring yourself to name one specific failure mode forces the slow cognition that review requires.

If you cannot name a concrete edge case, that is useful information too: it means you do not yet understand the code well enough to accept it.

Check 4: Check security boundaries at system edges

AI models are trained on examples that often elide security handling because correct security handling is verbose and domain-specific. The result: AI-generated code frequently generates syntactically correct calls that make the wrong security assumption.

The four places to check automatically: (1) any SQL query construction — look for string interpolation that should be a parameterized query; (2) any HTML rendering — look for unescaped user input; (3) any authentication check — look for the assumption that a user object exists when it might not; (4) any file path construction — look for user-controlled input that could traverse the directory structure.

These are not exotic attack vectors. They are the four most common AI-generated security gaps, and they appear in code that looks correct on a quick scroll because the surrounding structure is fine — the gap is a missing step inside an otherwise reasonable function.

Check 5: Start at the last generated block, not the first

When AI generates multiple sections in a single response — multiple functions, a class with several methods, a component with helper utilities — the first section is always the most polished. The model has the most context at the start of generation. Errors, overreach, and speculative additions accumulate toward the end.

The check: jump to the last function or last section before reading from the top. Read enough to understand what it does and whether it should exist at all. Speculative code (functions you did not ask for, utility methods added “in case”) is most likely to appear at the end. Once you have seen the full scope, work backwards. By the time you reach the first section, you know what the model decided to generate beyond your request.

This check is particularly important for agentic tools — Cursor Composer, Cline, Aider, GitHub Copilot Workspace — where a single task generates edits across multiple files. The last file in the diff is where speculative changes are most likely to appear.

Why these five checks and not a longer list

Longer checklists do not get used under time pressure. These five are ordered so that even if you only run three, you catch the most common failure modes: wrong dependencies (Check 1), missing error handling (Check 2), and security gaps (Check 4). Checks 3 and 5 are behavioral forcing functions that improve the quality of every other check by keeping you in active evaluation rather than passive reading mode.

The full checklist takes under two minutes on a typical function. That is less time than debugging the production issue that missing one of these checks will eventually cause.

Tool-specific variations

The five checks above are tool-agnostic. Each AI coding tool also has specific bypass mechanisms that are worth understanding: the Tab reflex that inline autocomplete creates, the approval fatigue that agentic tools generate, the authority bleed that happens when the IDE interface lends credibility to AI output. In-depth posts on each tool in this series cover those specific mechanisms:

Tool-specific guides in this series

ZenCode — stay in review mode between AI generations

A VS Code extension that fires a 10-second breathing pause during AI generation gaps. Keeps you in active evaluation instead of passive reading mode — so the five checks above actually happen.

Get ZenCode free

Try it in the browser · see the real numbers