Ellipsis AI: how to review code when a bot has already reviewed your PR before you open it
Ellipsis AI is a GitHub App that installs into your repository and automatically reviews every pull request the moment it opens. Without any developer action, Ellipsis reads the diff, identifies bugs, flags security issues, checks for style violations, and posts a structured review comment before the first human reviewer has clicked the PR link. It can also generate PR descriptions, summarize changes for non-technical stakeholders, and open its own pull requests to fix issues it finds. For teams managing high PR volume, the appeal is clear: every PR arrives with a first-pass review already attached, reducing the cognitive load of the reviewer who arrives next.
That always-on, pre-loaded review is exactly what makes Ellipsis worth analyzing carefully from a human review standpoint. CodeRabbit and Qodo Merge share the same integration pattern — a bot review that lands before the human opens the diff. What distinguishes Ellipsis is its emphasis on bug detection and security scanning rather than style feedback alone, which makes its review comments feel more authoritative than a linter-style pass. That authority is where the three review traps below originate.
The three Ellipsis AI review traps
1. First-read anchoring
The most common way to review a PR with Ellipsis installed is to open the PR, see the Ellipsis comment at the top, read what it found, and then open the diff. This ordering feels efficient — you have context about what to look for before you start reading. In practice, it is the most expensive review mistake you can make.
Reading the bot’s review before reading the diff anchors your attention to what Ellipsis flagged. Your brain enters the diff looking to confirm, explain, or dismiss those specific points — which means your attention is spent on the bot’s chosen territory rather than your own. Anything Ellipsis did not flag receives significantly less scrutiny not because you decided it deserved less scrutiny, but because your attention was already committed before you began. A logic error in a module the bot did not examine, an API misuse that falls outside its security patterns, a performance regression in a hot path it did not profile — these go unreviewed not by decision but by default.
The fix is a strict ordering rule: read the diff first, form your own list of concerns, and only then read the Ellipsis review to check for overlap and gaps. This takes discipline because the Ellipsis comment is visually prominent at the top of the PR page and feels like the natural starting point. It is not the starting point. It is a second opinion that is only useful after you have formed a first one. A second opinion read before the first collapses into a single opinion — the bot’s.
2. Completeness illusion
GitHub’s PR interface surfaces review activity visibly: comment counts, review status badges, unresolved threads. When Ellipsis posts a review with six comments and three of them have been resolved, the PR looks 50% reviewed in the UI before any human has weighed in. This visual state is a completeness illusion — it signals review activity without representing review judgment.
The illusion compounds across the team. The PR author resolves Ellipsis’s comments, marks threads as resolved, and the unresolved count drops. The human reviewer arrives at a PR with two open threads — not the original six — and implicitly treats it as a PR that has already been substantially reviewed. The cognitive load of a PR with six unreviewed areas feels different from a PR with two unreviewed areas, even when the difference is entirely the bot’s accounting rather than a real human judgment about the other four.
The practical fix is to ignore resolved-thread counts on PRs where Ellipsis has been active and instead treat the diff as a fresh, unreviewed document. The number of Ellipsis comments resolved tells you something about whether the author addressed the bot’s specific concerns; it tells you nothing about how much of the actual code has received human attention. Keeping these two measures separate — bot activity and human review coverage — prevents the first from substituting for the second.
3. Bot-calibration normalization
The most subtle trap develops over weeks, not individual reviews. Teams using Ellipsis consistently begin to calibrate their review expectations to what Ellipsis catches and what it misses. Issues that Ellipsis flags reliably — null pointer risks, common SQL injection patterns, obvious off-by-one errors in loop bounds — become the visible category of “things code reviews find.” Issues that Ellipsis misses reliably — semantic correctness, business logic errors, cross-service contract violations, data model assumptions that don’t survive edge cases — become invisible not because the team stopped caring but because the reference frame for “what review catches” has been shaped by the tool’s precision recall.
This is the same drift that affects any automated tool in a review pipeline — GitHub Copilot Autofix creates it for security categories, Sweep AI creates it for bug-fix scope — but Ellipsis’s broad coverage across bug detection, security, and style makes the drift harder to notice. The tool appears to cover everything, so gaps in its coverage become invisible rather than obviously present.
The fix is a periodic calibration audit: once a month, pull a sample of merged PRs and review the diffs as if no bot had touched them. Compare what you find against what Ellipsis found at the time. The gap between your fresh read and the bot’s review is your team’s calibration debt — the category of issues that have been drifting out of your human review frame. Without this audit, the drift compounds silently until it surfaces as a production incident that a bot review would not have caught and a human review would have caught six months earlier.
How to use Ellipsis AI without outsourcing your judgment
None of these traps argue against Ellipsis. For teams dealing with PR volume at scale, an always-on first-pass review that catches real bugs and security issues before a human opens the diff is genuine value. The traps are a direct consequence of how that value is delivered: a pre-loaded, visually prominent review that arrives before the human reviewer and shapes everything that follows.
Three adjustments make Ellipsis sustainable without surrendering review quality. First, enforce a diff-first ordering rule across the team — read the diff before reading the Ellipsis comment, always. This prevents first-read anchoring and costs nothing except the habit of reaching for the bot summary first. Second, track human review coverage separately from bot activity; resolved Ellipsis threads are not a proxy for human attention. Third, run a monthly calibration audit on a sample of merged PRs to surface the categories of issues that Ellipsis consistently misses before those categories have normalized into invisibility.
The review traps above are not Ellipsis-specific failures. They are the general failure modes of pre-loaded automated review applied at the PR surface. Ellipsis happens to be a capable tool that makes these traps more likely, not less, because its coverage is broad enough to feel comprehensive when it isn’t. The habits above are what preserve the human judgment that makes automated review useful rather than replacing it.
Related reading: CodeRabbit covers the same pre-loaded bot review pattern with its own anchoring traps. Qodo Merge examines how AI-generated PR summaries shape reviewer expectations before the diff is read. GitHub Copilot PR code review covers the Copilot-native review integration and its context-window limits. Sweep AI explores automated bot PRs that fix issues directly. Sourcery AI examines the traps when AI refactors code silently before the commit, before any reviewer sees the original. How to review AI-generated code covers the general checklist for evaluating code from any automated source. For a full comparison of AI code review tools, see the best AI coding tools 2026 roundup.
Don’t let the bot’s review become your review
ZenCode prompts you to form your own read before reaching for the automated summary — one question that keeps the human judgment in front.
Try ZenCode free