Devstral: how to review code when Mistral’s open-weights coding agent runs commands and writes diffs on your machine

2026-04-30 · 5 min read · ZenCode

Devstral is Mistral AI’s open-weights coding agent model, designed specifically for multi-step software engineering tasks. Unlike Codestral, which is a fill-in-the-middle completion model, Devstral is an agentic model: it reads files, runs terminal commands, writes diffs, and iterates until a task is done or it gets stuck. Because it is open-weights, you can run it locally — via Ollama, LM Studio, or a self-hosted inference server — and pair it with tools like Aider, OpenAI Codex CLI-compatible frontends, or a simple agent harness of your own. The model executes against your local filesystem with your own credentials and your own terminal process. There is no cloud sandbox between Devstral and your codebase.

That direct, local execution model is what makes Devstral worth analyzing carefully from a code-review standpoint. Google Jules, Devin, and Cursor Background Agents all operate in isolated environments — cloud VMs, sandboxed containers — that create a physical boundary between the agent and your production system. Devstral running locally has no such boundary. The three review traps below are a direct consequence of that proximity.

The three Devstral review traps

1. Local execution trust transfer

When you run Devstral locally, the experience is cognitively close to running your own shell scripts. The terminal is your terminal. The file changes land in your working directory. The process has the permissions you have. This proximity collapses a distinction that remote agent tools preserve by design: the difference between code the agent generated and code you wrote.

With a cloud-hosted agent, there is a clear interface boundary. The agent operates somewhere else and delivers a result through a PR, a diff view, or an API response. The foreign delivery surface acts as a reminder that this output came from somewhere outside your deliberate control. Devstral running in your terminal does not create that reminder. The diff appears in your editor the same way a local edit would. The tests run in your test runner. The output looks and feels like work you did, because the execution environment is indistinguishable from the one you use when you do work yourself.

This trust transfer is invisible and automatic. You did not decide to trust the output more because it ran locally — the trust arrived without a decision. The fix is a deliberate break. Before reviewing a Devstral diff, write one sentence describing what the code should accomplish and what it must not break. That sentence is yours. The diff is not. Reviewing the diff against your sentence re-establishes the distinction that local execution erased.

2. Planning loop diff opacity

Devstral is a reasoning-capable agent model. Before writing code, it reasons through the problem — reading relevant files, identifying dependencies, planning an approach. During execution, it may try an approach, encounter an error in tool output, and revise. The diff you receive at the end of this process is the final state: the cleaned-up result of a multi-step sequence of attempts, pivots, and error recoveries.

What the diff does not show is the path. A guard clause that looks arbitrary in the final code may have been added in response to a specific runtime error the model encountered mid-execution. A structural decision that looks overcomplicated — an extra abstraction layer, an unexpected interface boundary — may reflect an approach the model settled on after an earlier simpler approach failed. The diff presents the endpoint as if it were the original intent, and reviewers tend to evaluate it that way.

This is the same trap that appears in GitHub Copilot agent mode and Junie: the visible output hides the generative process that produced it. With Devstral, you can partially recover the path by examining the agent’s tool call log if your harness exposes one — the sequence of file reads, shell commands, and error responses that led to the final diff. If your setup does not expose that log, review the diff for structural coherence explicitly: ask whether each non-obvious decision is consistent with the surrounding code or whether it reads like a response to a specific runtime failure. Decisions that look defensive without context often are.

3. Selective file-read completeness gap

One of Devstral’s strengths as a local agent is its ability to read your actual codebase during task execution. Unlike tools that work from a static snapshot or a truncated context window, Devstral can issue ls, cat, and grep commands against the live filesystem to build understanding before writing code. This is meaningfully better than working from stale context.

The trap is in the word “can.” Devstral reads files selectively, based on what it decides is relevant given the task description and the results of its prior reads. The code it writes reflects which files it actually read — not which files were relevant to the change. These two sets are not always the same. A function the agent modified may have callers in three different modules; the agent may have read one of them and written code that is compatible with that caller but breaks the other two. A configuration schema may have two interpretations depending on a runtime flag the agent found in one file but not in the environment file it did not look at.

The completeness gap is invisible at review time. The diff looks self-consistent because Devstral wrote coherent code relative to what it understood. The reviewer cannot tell from the diff which files the agent consulted. The practical fix is a dependency check pass separate from the correctness pass: after reviewing the diff for what the code does, identify every other file that the changed code depends on or that depends on the changed code, and verify those relationships directly rather than assuming the agent’s file sampling covered them. On a mid-sized codebase this takes two to five minutes. It catches the completeness gap that the diff alone cannot expose.

How to use Devstral without outsourcing your judgment

None of these traps argue against using Devstral. For developers who want a capable open-weights coding agent that runs entirely on local infrastructure — no API keys, no cloud dependency, full data locality — Devstral is a significant tool. The traps are specific to the local execution model that makes Devstral attractive: the same proximity that removes cloud dependency also removes the cognitive boundary that remote agents preserve by default.

The practical posture is three separate passes. First, write your own one-sentence description of what the code should do before reading the diff — this re-establishes the boundary that local execution erases. Second, review the diff for structural coherence, flagging any decision that looks defensive or non-obvious as a candidate for checking against the agent’s execution log. Third, run a dependency pass: identify the files the change touches and the files that depend on those files, and verify those connections directly. Each pass addresses one of the three traps. Together they make the local execution model safe to rely on without outsourcing the review judgment that the agent is not equipped to perform.

Devstral is Mistral betting that the best coding agent is one you own and run yourself. For the right infrastructure constraints, that bet is correct. The review habits above are what make it sustainable.


Related reading: Mistral Codestral covers the fill-in-the-middle completion traps from the same model family. Aider examines a similar local terminal-agent workflow with its own diff-acceptance traps. TabbyML explores the self-hosted correctness halo problem that applies to any locally-run model. Refact.ai covers another self-hosted coding assistant with overlapping deployment patterns. For a full comparison of AI coding agent tools, see the best AI coding tools 2026 roundup.

Don’t let local execution feel like code you wrote yourself

ZenCode prompts you to reset before reviewing agent output — one question that re-establishes the boundary the local model erased.

Try ZenCode free

More posts on AI-assisted coding habits