Goose by Block: how to review code when a local AI agent uses tools, browses the web, and edits files autonomously

2026-04-29 · 5 min read · ZenCode

Goose is an open-source AI coding agent released by Block — the company behind Square and Cash App. Unlike cloud-based agents that run in isolated sandboxes and deliver results through pull requests, Goose runs entirely on your local machine. It connects to whichever AI provider you configure, then works through a sequence of tool calls: reading and writing files, running shell commands, browsing web pages, querying APIs, and interacting with developer services like GitHub or Jira. When you describe a task, Goose plans and executes autonomously, using whatever combination of tools its reasoning decides are needed, until it concludes the task is complete or runs into something it cannot resolve.

This architecture — local execution, broad tool access, configurable AI backend — creates a category of review problem that is distinct from both inline completion tools and remote agent platforms. The code Goose writes often looks unremarkable. The challenge is that reviewing it as if it were written by a developer or a simpler completion tool misses the ways its production context shapes what it contains. Three traps follow from how Goose actually works.

The three Goose attention traps

1. Tool-sequence opacity

When Goose completes a task, it presents you with a summary of what it did and the resulting file changes. What the summary does not capture in reviewable form is the full tool-call sequence: which files it read before writing, which shell commands it ran and what they returned, which web pages it fetched, and how those observations shaped each decision. A function in the final diff may encode an assumption Goose formed by reading a Stack Overflow answer, a pattern it extracted from a file you did not expect it to read, or a constraint it inferred from the output of a command that is no longer visible.

This differs from reviewing code written by a developer because a developer’s mental model can be interrogated: you can ask why they made a choice and get an answer that reflects the actual reasoning. Goose’s tool sequence is logged within the session, but by the time you are reviewing the diff, that sequence is typically gone from your attention. What you see is the outcome of a reasoning process whose inputs are partially hidden.

The review correction is to treat every non-obvious choice as externally sourced until you can explain it from the code itself. If a constant, path, or API shape appears in the code without an obvious origin in the codebase, investigate rather than assume it was derived from the task description. Goose often gets things right, but the path it took to get there — reading your package.json, fetching a library’s changelog, running ls on a directory — can introduce specificity that is harder to spot than a hallucination would be, because the value looks plausible and often is correct under the exact conditions Goose observed.

2. Local-state contamination

Because Goose runs on your machine with access to your filesystem, it naturally reads context that remote agents do not see. Configuration files, environment files, build artifacts, local git state, installed packages — all of these are readable by Goose and may inform the code it writes. In most cases this is useful: Goose can read your actual configuration instead of guessing at it. The problem appears when local state that is specific to your environment enters the generated code as a hardcoded assumption.

Common forms: an absolute path that only exists on your machine, a port number taken from a local .env file rather than a configuration constant, a dependency version pinned to what you happen to have installed locally rather than what the project specifies, an API endpoint observed from local logs that has not been formally documented as the canonical one to use. Each of these passes review because it is correct — on your machine, right now. The failure appears in CI, in production, or on a teammate’s machine where the local state Goose read does not exist.

The fix is to read every literal value in the diff with explicit attention to where it came from. Values that cannot be traced to the task description, an existing configuration constant, or a documented external source should be treated as potentially contaminated. This is the same review discipline required for any code that was written with access to a specific environment, but it is easy to skip when the code otherwise looks correct and the tests pass locally.

3. Provider-model inconsistency

Goose is model-agnostic. You can configure it to use Claude, GPT-4o, Gemini, a locally running model via Ollama, or any other provider that exposes a compatible API. This is one of its practical strengths: you choose the model that fits your cost, capability, or privacy constraints. For code review, it introduces a form of inconsistency that does not exist with tools that use a fixed backend.

Different models have systematically different tendencies in how they write code. One model prefers verbose defensive checks; another produces tighter logic with fewer guards. One tends to inline error handling; another extracts it into helpers. One consistently chooses one abstraction layer; another reaches for a different one. These tendencies are not random — they are stable patterns of a given model’s training. A reviewer who has developed intuitions about code from one model may apply those intuitions to code from a different model and draw incorrect conclusions: reading absence of defensive checks as confidence where it is actually a stylistic difference, or reading verbose structure as over-engineering where it reflects a different model’s idiom.

This matters most in teams where different developers use different Goose configurations, or where the configuration changes between sessions. Code written by Goose with Claude as its backend and code written by Goose with a local model will look different in ways that are not obviously attributable to Goose. A reviewer evaluating the combined output has no single set of heuristics that applies cleanly to all of it. The practical mitigation is to record which provider was used in the commit message or PR description, and to recognize in review that model-origin differences are a legitimate signal rather than unexplained style variation.

What Goose shares with other local agents — and what it does not

The tool-sequence opacity trap is present in any agentic tool, including Claude Code, Cline, and OpenHands. All of them execute sequences of actions whose intermediate states are not carried into the code review interface. The local-state contamination trap is specific to agents that run on your machine with broad filesystem access — it is less acute with cloud sandboxes like Devin or Google Jules, which operate in clean, isolated environments where your local dotfiles do not exist. The provider-model inconsistency trap is unique to Goose among mainstream tools, because Goose is the only widely used agent that treats the underlying model as a fully configurable parameter rather than an internal implementation detail.

Goose’s openness is a genuine architectural advantage. Running locally means no data leaves your machine, no external service has access to your codebase, and you control the full execution environment. These properties matter for teams with strict data policies or for work on sensitive codebases. The review discipline they require is not a reason to avoid the tool — it is the cost of those properties, and it is lower than the equivalent cost of reviewing code from any other capable autonomous agent.

The practical summary: when reviewing Goose output, ask where each non-obvious value came from, treat every literal that references your local environment as a portability risk, and note which model produced the code so that model-specific tendencies can be distinguished from genuine design choices. The diff is accurate. The question is always what it is accurate about.

Stay deliberate when reviewing local agent output

ZenCode helps developers build review habits for code generated by autonomous agents running on their own machine.

Get ZenCode

More writing on AI coding