Google Jules: how to review code when an async AI agent resolves GitHub issues overnight

2026-04-29 · 5 min read · ZenCode

Google Jules is an asynchronous AI coding agent. Unlike Cursor, Cline, or GitHub Copilot — which all respond in real time as you type — Jules operates on a different model entirely: you assign it a GitHub issue, it spins up a cloud environment, works independently, and opens a pull request when it is done. You are not watching it work. You are not guiding it through iterations. You come back later and there is a PR waiting for review.

That asynchronous model is the product’s core differentiator. It is also the source of three review traps that are entirely absent from synchronous AI coding tools. The traps do not arise because Jules is worse at writing code. They arise from the temporal gap between the moment you assigned the issue and the moment you open the PR. That gap changes your cognitive state in ways that systematically work against careful code review.

The three Google Jules attention traps

1. Morning-after context collapse

When you assign a Jules task, you have strong context. You’ve just been thinking about the bug, the feature request, or the technical constraint behind the issue. You know what correct behavior looks like, what edge cases matter, and what the existing code around the problem area does. That context is active working memory.

By the time Jules opens the PR — hours later, overnight, or across a context switch to other work — that working memory has dissipated. You open the diff with significantly less active understanding of the problem than you had when you created the task. The code Jules wrote may be correct, incorrect, or partially correct, but your ability to evaluate it critically is reduced by the same mechanism that makes morning code review of last night’s work harder than real-time review.

The additional distortion is that Jules’s PR arrives as a completed artifact. Synchronous tools surface one suggestion at a time, requiring you to evaluate each in context. Jules presents a finished branch with multiple commits that together implement a solution. The completeness creates a psychological closure effect: there is a beginning, a middle, and an end. Reviewing a complete solution feels like evaluating a finished thing rather than catching a problem before it solidifies.

The fix is to re-establish your own context before opening the PR. Before reading a single line of Jules’s diff, re-read the original issue and write down in a comment or scratch note what you expected the correct solution to look like. That thirty-second exercise reconstructs the mental model you had when you created the task. With that model active, you can evaluate the PR against your own prior expectation rather than evaluating it against the frame Jules chose to use.

2. The issue-to-PR completeness frame

Jules takes a GitHub issue and produces a pull request. The natural cognitive frame for reviewing that PR is: did Jules fix the issue? That is a much narrower question than the right question, which is: is this PR good code?

GitHub issues are written to describe problems from the user’s or reporter’s perspective. They describe symptoms, surface behaviors, and observed failures. They rarely specify the correct implementation approach, the right level of abstraction for the fix, or the tradeoffs that the fix should optimize for. A narrow issue — “the login button is disabled after a failed attempt” — can be fixed at the symptom level (re-enable the button after the error state clears) or at the causal level (correct the state management logic that set the disabled flag incorrectly). Both solutions “fix the issue.” They are not equivalent code changes.

When Jules presents a PR, the natural review question — did it fix the issue? — answers itself. Jules fixed the issue. That is why there is a PR. The completeness frame makes it hard to ask the harder question: did it fix the issue in the right way? Reviewers who start from the issue description and trace forward to Jules’s solution are implicitly auditing Jules’s choice of approach. Reviewers who start from Jules’s PR description and trace backward to the issue are implicitly validating that Jules addressed what was asked. Those two directions produce different review outcomes for the same diff.

The fix is to evaluate the PR as a code change before evaluating it as an issue resolution. Read the diff first, without looking at the issue. Ask: what does this code change do? Is it correct? Is it the right approach? Does it introduce any risks? Then read the issue and ask: does this change address the reported problem? Separating those two questions prevents the narrower second question from collapsing into the first.

3. Clean-commit-history trust

Jules produces clean, well-structured commits. Each commit is bounded to a coherent change, the commit messages are descriptive and grammatically correct, and the overall branch history reads like work done by a careful engineer. Clean commit history is normally a strong positive signal in code review — it suggests the author thought through what they were doing and communicated it clearly.

With Jules, clean commit history means something different. The commit messages are generated to describe the changes Jules made, not to communicate the reasoning Jules used to choose those changes. A well-written commit message from Jules is evidence that Jules understood how to write a commit message. It is not evidence that the underlying decision — which abstraction to use, which edge case to handle, which existing pattern to follow — was correct. The message is a description; the decision is a separate thing.

This matters because commit message quality is one of the heuristics reviewers use to calibrate their scrutiny of the underlying code. A commit message like “Refactor auth state handler to use reducer pattern for consistent disabled-state transitions” communicates clearly, implies the author had a rationale, and produces a reduction in reviewer skepticism before the diff is even opened. That reduction is appropriate when the message represents a human engineer’s actual reasoning. It is inappropriate when the message was generated to describe a change whose rationale exists only as a sequence of model weights.

The fix is to treat Jules commit messages as structural descriptions, not engineering rationales. When a commit message implies a design decision — “use reducer pattern,” “extract to service layer,” “consolidate error handling” — evaluate the decision independently. Ask whether that approach is correct for this codebase, not whether it is consistent with what the commit message describes. The message accurately describes the change. The change may or may not be the right one.

What makes async agents different from synchronous ones

The traps above are not specific to Jules. They apply to any asynchronous coding agent that accepts a task and returns a completed PR. Devin, OpenHands running in headless mode, and similar tools produce the same review challenges. The common thread is the temporal separation between task assignment and review. That gap degrades active context, primes the reviewer to evaluate completion rather than quality, and makes generated signals of quality — clean commit history, structured PRs, descriptive titles — look more meaningful than they are.

Jules is a genuinely capable tool for the tasks it is designed for: autonomous resolution of well-scoped issues on codebases it can understand. The value proposition is real — moving work off the developer’s plate and into an agent’s hands is a meaningful productivity gain when the agent is reliable enough. The review discipline required to capture that value safely is also real: rebuild your context before you open the diff, evaluate the code before you evaluate the issue resolution, and read past the quality of the commit messages to the quality of the decisions they describe. None of that discipline is harder than reviewing a synchronous AI suggestion. It is just applied at a different moment, after a longer gap, with more completed work waiting for a response.

Related reading: Devin autonomous agent review · OpenHands autonomous agent · Claude Code terminal agent · Plandex multi-file agent · GitHub Copilot Workspace

Stay focused while async agents build your PRs

ZenCode helps developers build better review habits in the age of AI code assistants.

Get ZenCode

More writing on AI coding