Research Explainer · Liu et al. (2026)

Claude Code is a thin agent loop wrapped in a thick safety harness

A source-level reading of Anthropic's coding agent finds that about 1.6% of the code is AI decision logic. The other 98.4% is permission gates, context compaction, extensibility plumbing, and recovery.

Published April 2026

1.6% of the codebase is AI decision logic; the remaining 98.4% is operational harness

93% of permission prompts users approve, which is why deny-first and sandboxing exist as independent layers

7× more tokens consumed by agent teams in plan mode versus a standard session, motivating summary-only subagent returns

5 sequential context-compaction layers run before every model call, from per-tool budgets to full summarisation

The thesis

Anthropic publishes user docs for Claude Code, not architecture docs. Liu and colleagues at MBZUAI's VILA Lab read the extracted TypeScript source for v2.1.88 line by line and wrote down what they found. The result is a 46-page reverse-engineering of a production coding agent that runs shell commands, edits files, and calls external services on a developer's behalf.

Their central finding is a ratio. By community estimate, roughly 1.6% of the codebase is the agent's decision logic. The other 98.4% is the harness: permission gates, tool routing, context compaction, extensibility plumbing, recovery logic, session persistence. The model decides; everything else exists to make sure it can decide safely and that its decisions actually run.

This inverts the dominant pattern in agent engineering. Frameworks like LangGraph push decision logic into developer-defined state graphs. Devin pairs an explicit planner with the model. Claude Code goes the other way: give the model maximum latitude inside a deterministic environment so rich that good judgment is the only thing it needs to supply.

Trust drift: auto-approve rate grows with experience

McCain et al. (2026), cited in Liu et al. Section 2.1. Longitudinal Claude Code usage data showing how the share of permission requests users auto-approve rises with cumulative sessions.

The five-layer compaction pipeline

Liu et al. (2026), Section 4.3 and 7.3. The five context shapers run in order before every model call, with each layer escalating in disruptiveness. Bar heights show the qualitative aggressiveness ranking described in the paper, not measured token reductions.

Five values, thirteen principles, one loop

The authors extract five values that motivate the architecture: human decision authority, safety and security, reliable execution, capability amplification, and contextual adaptability. Each maps to design principles, and each principle maps to specific source files.

The core is unromantic. A queryLoop() async generator runs the model, parses tool-use blocks, checks permissions, dispatches tools, collects results, and loops until the model stops producing tool calls. That is it. The same loop serves the interactive CLI, the headless CLI, the SDK, and IDE integrations. Only the rendering layer changes.

Everything interesting happens around the loop. Seven independent safety layers, from blanket tool pre-filtering through deny-first rules through an ML classifier through shell sandboxing, must all be satisfied before a tool runs. Five context shapers compact the conversation before every model call. Four extension mechanisms (MCP servers, plugins, skills, hooks) plug in at different context costs. Subagents spawn with isolated context windows and return only summaries to the parent.

Safety as a response to approval fatigue

The deny-first design exists because Anthropic measured that users approve 93% of permission prompts. Interactive confirmation is behaviourally unreliable, so safety cannot rest on human vigilance. The architectural response is not more dialogs but fewer: defined boundaries within which the agent can act freely, and multiple independent layers that block bad actions whether or not the user is paying attention.

That defense-in-depth rests on an independence assumption that breaks under pressure. Security researchers documented that shell commands with more than 50 subcommands fall back to a single generic prompt because per-subcommand parsing froze the UI. When safety layers share performance constraints, they can degrade together.

Two CVEs disclosed in 2025 and 2026 exploit a temporal gap the paper makes explicit: extensions (hooks, MCP servers) load before the trust dialog, creating a pre-trust window where the permission system is not yet engaged. The spatial diagram is correct; the temporal ordering is the attack surface.

Context as the binding constraint

The five-layer compaction pipeline reflects a clear position: the context window is the scarce resource. No single strategy handles every kind of pressure, so the layers escalate. Budget reduction caps individual tool outputs. Snip trims old history. Microcompact does cache-aware compression. Context collapse projects a virtual view. Auto-compact summarises the lot.

CLAUDE.md memory is plain Markdown in a four-level hierarchy from OS-managed to gitignored-local. The authors note this is a deliberate trade: file-based memory is inspectable, editable, and version-controllable, where embedding-based retrieval is none of those things. Compliance is probabilistic; the deterministic enforcement lives in the permission rules.

The flip side is opacity. Context collapse operates without user-visible output. Microcompact's cache-aware decisions depend on prompt caching the user cannot see. Five interacting layers, several behind feature flags, produce behaviour difficult to predict from any single config file.

What's missing, and what comes next

The paper applies a sixth concern as an evaluative lens rather than a design value: long-term human capability preservation. The architecture amplifies short-term productivity but offers limited mechanisms that explicitly preserve long-term understanding or codebase coherence.

The supporting evidence comes from adjacent work, not Claude Code directly. A 16-developer RCT (Becker et al. 2025) found AI tools made experienced developers 19% slower despite a perceived 20% improvement. A causal analysis of 807 Cursor-using repositories (He et al. 2025) found code complexity rose 40.7%, with the initial velocity spike dissipating to baseline by month three. An EEG study (Kosmyna et al. 2025) found weakened neural connectivity in LLM users that persisted after the tool was removed.

The OpenClaw comparison is the paper's sharpest move. The same design questions recur, but a multi-channel personal assistant gateway answers them differently: perimeter-level access control instead of per-action evaluation, gateway-wide plugin registries instead of per-window context costs, multi-agent routing instead of subagent delegation. The two systems can even compose, with OpenClaw hosting Claude Code as a coding harness. The design space is layered, not flat.

Claude Code's design says something direct: in agentic coding, the hard problem is not deciding what to do but ensuring that decisions are safe to execute, cheap to represent, and recoverable when wrong. The 1.6/98.4 ratio is not a criticism of the AI core - it is a description of what production deployment actually requires.

References

Liu, J., Zhao, X., Shang, X., & Shen, Z. (2026). Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems. arXiv preprint arXiv:2604.14228. https://arxiv.org/abs/2604.14228

Hughes, J. (2026). Claude Code auto mode: A safer way to skip permissions. Anthropic Engineering. https://www.anthropic.com/engineering/claude-code-auto-mode

He, H., Miller, C., Agarwal, S., Kästner, C., & Vasilescu, B. (2025). Speed at the cost of quality: How Cursor AI increases short-term velocity and long-term complexity in open-source projects. arXiv preprint arXiv:2511.04427. https://arxiv.org/abs/2511.04427

Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the impact of early-2025 AI on experienced open-source developer productivity. arXiv preprint arXiv:2507.09089. https://arxiv.org/abs/2507.09089

McCain, M., Millar, T., Huang, S., et al. (2026). Measuring AI agent autonomy in practice. Anthropic Research Blog. https://anthropic.com/research/measuring-agent-autonomy