Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

Published 14 Apr 2026 in cs.SE, cs.AI, cs.CL, and cs.LG | (2604.14228v1)

Abstract: Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes its comprehensive architecture by analyzing the publicly available TypeScript source code and further comparing it with OpenClaw, an independent open-source AI agent system that answers many of the same design questions from a different deployment context. Our analysis identifies five human values, philosophies, and needs that motivate the architecture (human decision authority, safety and security, reliable execution, capability amplification, and contextual adaptability) and traces them through thirteen design principles to specific implementation choices. The core of the system is a simple while-loop that calls the model, runs tools, and repeats. Most of the code, however, lives in the systems around this loop: a permission system with seven modes and an ML-based classifier, a five-layer compaction pipeline for context management, four extensibility mechanisms (MCP, plugins, skills, and hooks), a subagent delegation mechanism with worktree isolation, and append-oriented session storage. A comparison with OpenClaw, a multi-channel personal assistant gateway, shows that the same recurring design questions produce different architectural answers when the deployment context changes: from per-action safety classification to perimeter-level access control, from a single CLI loop to an embedded runtime within a gateway control plane, and from context-window extensions to gateway-wide capability registration. We finally identify six open design directions for future agent systems, grounded in recent empirical, architectural, and policy literature.

Summary

  • The paper’s main contribution is its multi-layered agent loop design that enforces safety through deny-first permissions and contextual adaptability.
  • It details advanced techniques like context compaction, structured tool orchestration, and extensible mechanisms (MCP, plugins, subagents) for autonomous code modification.
  • Comparative analysis with systems like OpenClaw highlights crucial trade-offs in performance, security, and extensibility, informing future AI agent designs.

Architectural Analysis of Claude Code: The Design Space of Agentic Coding Systems

Introduction and Motivating Values

"Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems" (2604.14228) presents a rigorous source-level dissection of Claude Code, an agentic software development tool engineered by Anthropic for autonomous code modification, shell execution, and tool orchestration. The architecture is contextualized within a framework grounded in five human values: human decision authority, safety/security/privacy, reliable execution, capability amplification, and contextual adaptability. These values inform thirteen observable design principles, including deny-first escalation, layered safety, append-only durable state, graduated trust spectrum, composable extensibility, and reversibility-weighted risk assessment (Section 2).

High-Level System Architecture

At the macro-level, Claude Code decomposes into seven interconnected components: user, interfaces, agent loop, permission system, tools, state persistence, and execution environment. The data flow is centralized through a core agent loop, which receives user input via varied surfaces (CLI, SDK, IDE, etc.), assembles context, invokes the LLM, evaluates structured tool-use requests through a multi-layered permission and safety gate, and dispatches concrete operations to the execution backends. Figure 1

Figure 1: High-level system structure of Claude Code, delineating seven functional components converging on a shared agent loop.

This modular decomposition is further expanded into a five-layered architecture spanning interaction surfaces, agentic core (loop and compaction), safety/action (permissions, hooks, sandboxing, subagents), memory and state, and execution backends. Figure 2

Figure 2: Layered architecture with five subsystem layers, mapping interfaces to core, safety, state, and backend responsibilities.

Agentic Turn Execution: The Query Loop

The core operational cycle is an iterative while-loop, instantiating the ReAct paradigm: the system cyclically builds context, calls the model, parses tool invocations, routes through permissions, executes tools, logs results, and determines loop continuation. The turn pipeline applies five sequential context shapers to manage prompt window pressure: budget reduction, snip, microcompact, context collapse, and auto-compact. Figure 3

Figure 3: Single agentic turn execution, illustrating context assembly, model invocation, permission gating, tool result integration, and compaction.

Recovery logic handles output token overflows, context collapse, reactive compaction, streaming fallbacks, and supports forced aborts and user intervention, ensuring resilience against both user and system error classes.

Permission System and Safety Boundary Enforcement

Claude Code’s safety posture is realized through an explicit, configurable permission system with deny-first semantics, integrating layered rule evaluation, an ML-based auto-mode classifier, shell sandboxing, and extensive pre/post-execution hooks. Seven distinct permission modes modulate agent autonomy and user intervention, supporting a graduated trust model. Figure 4

Figure 4: Permission gate overview, visualizing deny-first escalation, progressive trust spectrum, and composable policy mechanisms.

Empirical analysis reveals that users approve 93% of permission prompts, necessitating automated safeguards (classifier, sandbox) to counteract approval fatigue. Independence across safety layers is assumed but shown to degrade under joint performance bottlenecks, as evidenced by security analyses when parsing costs or UI constraints lead to permission bypasses.

Context Construction, Memory Management, and Compaction

Managing prompt context is treated as a principal resource constraint. Claude Code incorporates a multi-source context assembly pipeline (system prompt, CLAUDE.md hierarchy, auto memory, path-scoped rules, tool descriptions, conversation history, tool results, and compaction summaries). Figure 5

Figure 5: Context construction and memory hierarchy, mapping static/dynamic state to context window composition.

CLAUDE.md files provide a transparent, version-controllable memory mechanism, eschewing opaque embeddings or database-backed retrieval. The compaction pipeline is deliberately staged, progressively escalating from local, loss-minimizing reductions to global model-driven summary compaction, reflecting a lazy-degradation approach to context management.

Extensibility Mechanisms: MCP, Plugins, Skills, Hooks

Claude Code explicitly partitions extensibility across four orthogonal mechanisms, each aligned to distinct context and deployment costs: Model Context Protocol (MCP) for external tools, plugins as packages, skills as instruction-level injection, and event-driven hooks. Tool pool assembly integrates built-ins, MCP-registered tools, and deduplicates, optimizing for both expressivity and context efficiency. The presence of four rather than one or two extension primitives is justified by the need to balance deployment and context window costs while avoiding unnecessary overhead for lightweight integrations.

Subagent Delegation and Isolation

Task delegation is realized through the Agent tool, supporting both built-in and user-defined subagent profiles with strict context and permission isolation. Subagents can operate within worktree, remote, or in-process isolation modes, returning only summary outputs to the parent and preserving corpus traceability via sidechain transcripts. Figure 6

Figure 6: Subagent orchestration and isolation, showing multi-axis dispatch and the maintenance of independent permission and context windows.

This architecture enables isolated parallelism without risking context explosion or cross-agent contamination, a critical factor as agent teams or multi-role workflows scale in complexity.

Session Persistence, Recovery, and Auditability

Persistent state is maintained as mostly append-only JSONL transcripts, separating session, prompt history, and subagent sidechains. Session-scoped permissions are intentionally not restored on resume/fork, upholding a conservative safety invariant. Compaction boundaries and explicit markers enable conversation forking, resumption, and rollback, reflecting a principled bias toward auditability over query-oriented complexity. Figure 7

Figure 7: Separation of live session state from durable storage, enabling reliable context compaction and session recoverability without implicit permission propagation.

Comparative Architecture: Claude Code vs. OpenClaw

A comparative analysis with OpenClaw—a persistent, gateway-based multi-channel agent—reveals that while the core design questions persist across agent systems, their answers bifurcate under distinct deployment and threat models. Claude Code hyper-focuses on per-action safety, CLI-coding harnesses, and context compression; OpenClaw centralizes trust at the gateway level, employs manifest-driven plugin registries, and supports multi-agent routing with strong channel isolations. This comparison establishes the composability and spectrum of agentic architectures, highlighting fundamental differences in trust boundary, memory, orchestration, and extensibility.

Architectural Tensions and Theoretical Implications

Several architectural tensions are empirically and normatively addressed:

  • Approval fatigue versus safety: Despite human-in-the-loop designs, high prompt approval rates necessitate independent enforcement (classifier/sandbox).
  • Performance versus security: Safety layers can fail jointly under shared performance/resource constraints.
  • Extensibility versus attack surface: Rich hook/MCP/plugin configurations introduce pre-trust vulnerabilities and elevate operational security requirements.
  • Capability/reliability trade-off: Local autonomy and bounded context introduce codebase complexity drifts not mitigated by current compaction/memory schemes.

Empirical studies of adjacent systems corroborate predicted phenomena such as code complexity increases (+40.7% in Cursor deployments [he2026cursor]), technical debt persistence in AI-authored code [liu2026techdebt], and observable skill/comprehension atrophy in assisted workflows [shen2026skill, kosmyna2025brain].

Open Research Directions

The paper identifies critical research frontiers in agentic system design:

  • Agent Observability–Evaluation Gap: The absence of harness-native, session-level error detection and correction scaffolding is a key juncture for future architecture.
  • Durable cross-session and organizational memory: Beyond append-only transcripts and local CLAUDE.md, maintaining and curating a memory substrate spanning longitudinal and collaborative axes remains unsolved.
  • Harness boundary evolution: Decoupling session, harness, and sandbox layers for composable, independently replaceable interfaces.
  • Governance and external compliance: Regulatory pressures (e.g., EU AI Act) will directly inform future architectures' logging, transparency, and permissioning capabilities.
  • Long-term human capability preservation: The current architecture does not treat skill retention, codebase coherence, or developer Understanding as first-class concerns.
  • Horizon scaling: Supporting multi-session, program-scale dependability in agentic workflows, including compositional subagents and state, is not fully addressed.

Conclusion

Claude Code exemplifies a production agent system in which local model autonomy is strictly bounded and scaffolded by extensive deterministic infrastructure for safety, context management, extensibility, and session durability. The architecture operationalizes clear value commitments without converging on monolithic patterns, instead layering mechanisms to create redundancy, flexibility, and traceability. The empirical and analytic evidence underscores architectural limitations, particularly with respect to composable memory, scaling agentic horizons, and long-term human skill retention, which are increasingly central as coding agents transition from augmentative to autonomous roles. As the comparative findings with OpenClaw demonstrate, agentic architectures will continue to diverge and specialize, shaped both by deployment context and external regulatory and governance forces.

References

A comprehensive bibliography is included in the source document and provides a wide lens on the contemporary literature of agent system architectures, safety-by-design, context management, multi-agent patterns, permission models, and emergent evaluation frameworks.

(2604.14228)

Whiteboard

Explain it Like I'm 14

What this paper is about

This paper takes a “peek under the hood” of Claude Code, an AI helper that can write and fix code by itself. Instead of just guessing what Claude Code does from the outside, the authors read its publicly available TypeScript code to explain how it actually works. They also compare it to another open-source system called OpenClaw to show how different design choices appear when the product is used in a different way.

In short: the paper explains how a modern coding agent is built, why it’s built that way, and what this means for future AI helpers.

What questions the paper tries to answer

The authors focus on simple-but-important questions every AI coding helper must face:

  • How much freedom should the AI have, and how do humans stay in control?
  • How can the system stay safe and avoid doing harmful things?
  • How does the agent keep track of what’s going on when there’s only so much it can “remember” at once?
  • How can people plug in their own tools and customize the agent?
  • How can the agent split work into parts and ask “mini-helpers” (subagents) to assist?
  • What changes when the same ideas are used in different products, like a gateway assistant (OpenClaw) instead of a coding tool?

How the researchers studied it

Instead of running big experiments, the authors used three simple approaches:

  1. Code reading: They analyzed the publicly available Claude Code source (TypeScript) to map design choices—what parts exist, how they connect, and why.
  2. Design principles: They identified five human values that seem to shape the system (like safety and user control) and traced those values into 13 design principles that show up in the code.
  3. Comparison: They compared Claude Code to OpenClaw, an open-source agent system, to see how the same questions can lead to different designs depending on where and how the agent is used.

They also walk through a concrete example—“Fix the failing test in auth.test.ts”—to show how the system behaves step by step.

The main ideas and why they matter

The heart of the system is a simple loop

Claude Code runs a basic repeat cycle (a “while loop”): think → try a tool → look at results → repeat until done. This is like a student who plans a step, does it, checks the result, and keeps going.

Why it matters: Simple loops are reliable and easy to debug. Most of the “smarts” come from the model’s reasoning and the strong support systems wrapped around this loop.

Five human values guide the design

The system’s architecture reflects five core values:

  • Human decision authority: People stay in charge. Claude asks for permission for sensitive actions and shows what it’s doing.
  • Safety and security: Multiple layers of protection stop bad or risky actions.
  • Reliable execution: The agent tries to do what you meant and checks its work.
  • Capability amplification: It helps you do more than you could alone.
  • Contextual adaptability: It adapts to your project, tools, and preferences.

These values are turned into 13 hands-on design principles (for example, “deny first, ask later” for permissions, and “treat attention/memory space as scarce and manage it carefully”).

Safety works like layered security at an airport

Permissions are “deny first, escalate to human.” There are several independent safety layers—rules, a machine-learning safety classifier, optional shell sandboxing, and hooks that can intercept actions. If any layer says “no,” the action is blocked.

Why it matters: Defense-in-depth reduces the chance of a single failure causing trouble. It also builds trust: users can see and control what the agent is allowed to do.

“Context window” is treated like a backpack with limited space

AI models can only “see” a limited amount of information at once (their context window). Claude Code uses a five-step “compaction” pipeline to fit more useful info into that space:

  • Trim oversized tool outputs
  • Snip old details over time
  • Micro-compact to handle overhead
  • Collapse very long histories into summaries
  • Auto-compact as a last resort using semantic compression

It also avoids loading unnecessary instructions until needed and limits how much subagent chatter flows back to the main chat.

Why it matters: Good memory management keeps the agent focused and accurate over long tasks.

Extensibility: many ways to plug things in

The system isn’t locked down. You can extend it through MCP (a protocol for tools), plugins, skills, and lifecycle hooks. These options sit at different “context costs”—some are lightweight, others more powerful but heavier.

Why it matters: People and teams can tune the agent to their tools and workflows without breaking the core system.

Subagents: small helpers with boundaries

The main agent can spawn subagents (like specialized helpers) that work in isolation and then return a short summary, not their full conversation.

Why it matters: This prevents memory overload and keeps responsibilities clear, while still letting the agent divide and conquer tasks.

Append-only logs: a diary you don’t edit

The agent keeps append-only session files (like a running diary). It records what happened, which helps with resuming work, auditing actions, and understanding past decisions.

Why it matters: Durable, transparent history supports reliability and accountability.

Same questions, different answers: Claude Code vs. OpenClaw

When the deployment context changes, design choices change too. The paper shows that:

  • Safety: Claude Code often checks each action; OpenClaw leans toward gateway-level access control (like a perimeter fence).
  • Runtime: Claude Code centers on a single CLI loop; OpenClaw embeds the agent inside a gateway that connects many channels.
  • Capabilities: Claude Code carefully packs context; OpenClaw emphasizes registering capabilities across the whole gateway.

Why it matters: There’s no one “right” architecture—what’s best depends on where the agent lives and what it’s for.

Open questions for the future

The authors call out six areas for future work, such as:

  • Better ways to see and evaluate what agents are doing (observability).
  • Smarter memory and learning across sessions.
  • Clearer boundaries between the model’s reasoning and the system’s execution.
  • Handling longer, more complex tasks over time (horizon scaling).
  • Governance and policy tools for responsible use.
  • Measuring whether agents help or hurt long-term human skill growth.

They also note a concern: if people rely too much on AI, their own skills might fade. Claude Code boosts short-term productivity, but it has fewer features aimed at preserving long-term human understanding.

Why this is important

  • For users: It explains how modern coding agents keep you in control, stay safe, and still speed you up.
  • For builders: It offers a map of design choices and trade-offs—how to handle permissions, memory limits, extensions, and subagents.
  • For researchers and policy makers: It highlights the values behind the tech and where future improvement is needed—especially for safety, trust, and human skill development.

Simple takeaway

Claude Code is like a careful, helpful apprentice: it works in small steps, asks permission for risky things, keeps a tidy log of what it did, and adapts to your tools. The paper shows how this is achieved in code and why those choices matter—and it points the way toward even safer, smarter, and more human-centered AI agents in the future.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The paper offers a detailed, source-level architectural analysis of Claude Code and a qualitative contrast to OpenClaw. However, multiple areas remain underexplored or empirically unvalidated. Future research could address the following gaps:

  • Empirical safety evaluation: Quantify false-positive/false-negative rates of the auto-mode classifier and the deny-first permission system across benign, ambiguous, and adversarial scenarios; release an open evaluation harness and red-teaming datasets for reproducibility.
  • Layered-safety interactions: Analyze how the seven safety layers compose in practice (ordering, redundancy, masking effects, and failure modes), including conflicts between hooks, permission rules, sandboxing, and classifier outcomes.
  • Sandboxing efficacy and bypasses: Systematically test shell sandbox constraints against realistic attack vectors (path traversal, symlink races, env var poisoning, subprocess escapes, toolchain exploits), and measure residual risk on different OSes.
  • Hook and extensibility attack surface: Map and test the security boundary of hooks, skills, plugins, and MCP servers (supply-chain risks, privilege escalation, cross-plugin interference), and propose least-privilege capability schemas and isolation patterns.
  • MCP trust and interop: Establish trust models and permission negotiation protocols for MCP servers (origin attestation, capability scoping, revocation), and evaluate interoperation with heterogeneous agent ecosystems beyond MCP.
  • Permission UX and habituation: Measure the cognitive load and habituation dynamics of permission prompts under different modes; test UI/UX designs that improve calibration without increasing friction (e.g., risk-tiered prompts, reversible-by-default flows).
  • Trust trajectory management: Develop quantitative models of trust evolution (calibration, over-trust, under-trust) and interventions that keep trust aligned with true risk as autonomy increases over long-term use.
  • Long-term human capability preservation: Design and evaluate mechanisms that explicitly sustain developer understanding and supervision skills (e.g., self-explanation checkpoints, “teach-back” prompts, codebase health nudges, evaluative separation of generation vs review).
  • Observability–evaluation gap: Define standardized telemetry and evaluation signals (tool success semantics, side effects, invariants, postconditions) to bridge streaming-level observability with task-level quality and safety metrics.
  • Context compaction trade-offs: Quantify the impact of each compaction layer (budget reduction, snip, microcompact, collapse, auto-compact) on accuracy, latency, and hallucinations; develop adaptive, learned policies for when and how to compact.
  • Memory and persistence limits: Evaluate the append-only JSONL approach at scale (storage growth, retrieval latency, PII exposure, tamper-evidence), and test cross-session memory strategies (promotion, decay, summarization quality) for codebase coherence.
  • Permission state across resume/fork: Study the safety/usability trade-offs of not restoring session-scoped permissions on resume; propose principled restoration policies (with provenance, expiry, and risk-tiering) and measure user impact.
  • Subagent orchestration and isolation: Measure how summary-only returns affect parent reasoning, debuggability, and failure propagation; explore scheduling, coordination patterns, and verification of subagent outputs under isolation constraints.
  • Determining the right harness boundary: Identify task regimes where minimal scaffolding underperforms explicit planners/graphs/tree-search; develop adaptive harness selection that toggles between reactive loops and structured orchestration.
  • Performance and cost envelopes: Provide quantitative benchmarks for latency, throughput, token spend, and tool concurrency under realistic workloads; investigate dynamic budgeting policies (early-exit heuristics, multi-model routing, effort control).
  • Capability coverage and tool idempotence: Audit built-in and MCP tools for coverage gaps, side effects, idempotence, and transactional semantics; introduce safeguards (dry-run modes, two-phase commits, invariants) for stateful operations.
  • Comparative, controlled evaluation vs OpenClaw: Move beyond qualitative contrast to controlled experiments that isolate deployment-context variables and quantify trade-offs (safety, autonomy, context efficiency, extensibility overhead).
  • Horizon scaling and checkpointing: Develop mechanisms for very long-horizon tasks (explicit goals/subgoals, milestones, verifiers, resumable checkpoints) and evaluate their effects on reliability and cost.
  • Governance and policy lifecycle: Specify machine-enforceable, versioned policy artifacts (provenance, diffing, rollback), and test policy update workflows (staging, canarying) under regulatory constraints and audit requirements.
  • Privacy and compliance: Assess data-minimization, encryption-at-rest/in-transit, retention, and access control for transcripts, histories, and sidechains; provide compliance-ready configurations and audit logging guarantees.
  • Model updates and compatibility: Investigate schema evolution and backward compatibility for tools, hooks, and MCP interfaces under model changes (context sizes, tool-use formats, thinking settings); propose robust migration strategies.
  • Generalizability and completeness of evidence: Validate that the analyzed source snapshot reflects production behavior (feature parity, configuration drift); document coverage gaps and divergences, and provide a protocol for continuous re-analysis as the codebase evolves.

Practical Applications

Immediate Applications

Below are concrete, deployable use cases that leverage the paper’s identified design choices (deny‑first safety posture, layered permissioning with an ML auto‑mode classifier, a five‑layer context compaction pipeline, MCP/plugins/skills/hooks extensibility, isolated subagent orchestration, and append‑only session storage).

  • Software development (enterprise): Secure agentic coding assistant in IDEs and CLI
    • What: Use the unified query loop, deny‑first permission system, shell sandboxing, and append‑only JSONL logs to let agents run tests, edit files, and refactor code with auditable trails.
    • Tools/workflows: IDE plugin + CLI harness; permission modes with human escalation; PreToolUse hooks for policy; session viewers for audit.
    • Assumptions/dependencies: Capable LLM with tool‑use; OS‑level sandboxing; policy configuration; developer buy‑in to review/approve prompts.
  • CI triage and repair bot (software/DevOps)
    • What: Headless CLI/SDK agent triggered by CI to reproduce failing tests, collect logs, propose patches, and open PRs within a deny‑first, reversible‑action‑first policy.
    • Tools/workflows: Streaming tool executor for faster multi‑tool runs; per‑tool output budgets; subagent “workers” scoped to individual tests with sidechain transcripts.
    • Assumptions/dependencies: CI integration; protected branches; pre‑configured permission modes and auto‑mode classifier thresholds.
  • DevOps task runner with reversibility‑weighted oversight (software/IT)
    • What: Automate routine shell operations (log collection, config reads) with light oversight while requiring explicit approval for state‑changing commands (deployments, migrations).
    • Tools/workflows: Shell tool + sandbox; permission rules that distinguish read‑only vs write actions; hooks that auto‑deny high‑risk scopes.
    • Assumptions/dependencies: Accurate action classification; well‑tuned policy rules; secure secrets handling outside the model.
  • Internal tool orchestration via MCP (software/productivity)
    • What: Connect to JIRA/GitHub/Slack or in‑house systems using MCP; let the agent coordinate issues, comments, and code reviews under layered policy enforcement.
    • Tools/workflows: MCP servers for services; tool pre‑filters; deferred tool schemas to control context; coordinator hooks for batch approvals.
    • Assumptions/dependencies: MCP server availability; service tokens/SSO; agreed access scopes per environment.
  • Prompt‑injection‑resistant browsing and fetches (security/software)
    • What: Use tool pre‑filtering, deny‑first evaluation, and hooks to sanitize and constrain web fetches and content ingestion.
    • Tools/workflows: Fetch/Browser tools with strict allow lists; PreToolUse hooks stripping dangerous parameters; auto‑mode classifier for suspicious requests.
    • Assumptions/dependencies: Up‑to‑date allow/deny policies; robust URL/domain scoping; monitoring for model drift.
  • Transparent project memory and configuration (education/software)
    • What: Manage agent instructions with CLAUDE.md hierarchy and directory‑specific files so students/teams can see and version control “what the agent knows.”
    • Tools/workflows: File‑based memory; lazy loading of nested instructions; course/project templates that include baseline CLAUDE.md.
    • Assumptions/dependencies: Team discipline to keep CLAUDE.md reviewed; VCS integration; model reliability with long instructions.
  • Reproducible HCI/SE studies of agent behavior (academia)
    • What: Use append‑only transcripts, sidechain subagent logs, and compaction summaries to reproduce and analyze agent sessions in experiments.
    • Tools/workflows: Session storage viewers; metrics on auto‑approve trajectories; replay harnesses using stored JSONL.
    • Assumptions/dependencies: Consent and privacy controls; storage management for logs; stable schema versions.
  • Safety policy “as code” pilots (policy/IT governance)
    • What: Externalize permissions and hooks so security teams can version and test policy changes without forking the agent codebase.
    • Tools/workflows: Rule sets in config; staging vs production modes; automated tests that simulate dangerous tool requests.
    • Assumptions/dependencies: Clear risk taxonomy; cross‑functional review; CI for policy bundles.
  • Local “auto‑mode within bounds” for individual developers (daily life/software)
    • What: Enable a graduated trust spectrum that auto‑approves clearly reversible actions (search, read, test) while prompting for edits or shell writes.
    • Tools/workflows: Auto‑mode ML classifier; reversible‑action rules; quick UI prompts; per‑session non‑restored elevated permissions.
    • Assumptions/dependencies: Reasonable classifier precision; user comprehension of modes; safe defaults after resume.
  • Plugin/skill marketplaces for agent extension (software ecosystem)
    • What: Distribute MCP servers and low‑context‑cost “skills” to add tools (e.g., database readers, documentation miners) without bloating prompts.
    • Tools/workflows: Skills registry; ToolSearch with deferred schemas; hooks for onboarding checks.
    • Assumptions/dependencies: Extension vetting; signature/identity of providers; dependency isolation.

Long-Term Applications

These applications require further research, scaling, standardization, or ecosystem maturity (e.g., more accurate classifiers, broader MCP adoption, improved observability, and refined governance).

  • Autonomous CI/CD with evaluators in the loop (software/DevOps)
    • What: Agent plans, tests, and deploys changes end‑to‑end, with evaluators and deny‑first gates at promotion points; rollback and audit are append‑only.
    • Dependencies: Strong evaluator‑optimizer patterns; higher‑fidelity sandboxes; formalized risk tiers; org‑wide IAM and policy-as-code.
  • Large‑scale multi‑agent refactoring (software)
    • What: Orchestrator spawns isolated subagents (per module/service) with sidechain logs and summary‑only returns to refactor sprawling codebases.
    • Dependencies: Horizon scaling and cross‑session coherence; conflict resolution across subagents; advanced context collapse and summarization quality.
  • Organization‑wide, cross‑session memory and governance (software/IT)
    • What: Persistent, curated memory across projects with retention policies, provenance, and human checkpoints to preserve codebase coherence over months.
    • Dependencies: New UX for memory management; robust RAG over audited stores; policies for retention/PII; evaluation against “capability preservation” metrics.
  • Sector‑regulated agent operations (healthcare, finance, gov)
    • What: Apply deny‑first, layered safety and append‑only audits to clinical admin, claims processing, or trading support bots that integrate via MCP.
    • Dependencies: Certification frameworks; domain policy packs; higher bar for sandboxing/segmentation; incident response playbooks.
  • Agentic IDEs with native policy and observability (software tools)
    • What: IDEs embed permission modes, hooks, compaction visualizers, and action diffs to make agent decisions legible and governable by teams.
    • Dependencies: Standards for permission APIs; interoperability with language servers; scalable UI for high‑volume telemetry.
  • Standardization and compliance for agent safety (policy/standards)
    • What: Regulatory or industry standards mandate deny‑first defaults, separation of reasoning/enforcement, and durable audit trails for agentic systems.
    • Dependencies: Consensus on baseline controls; conformance test suites; alignment with privacy and cybersecurity frameworks.
  • Autonomous IT operations with perimeter controls (IT/operations)
    • What: Agents handle patching, backups, and incident triage under perimeter‑level access control rather than per‑action prompts, with escalation paths.
    • Dependencies: Robust boundary enforcement; high‑precision action classification; integration with SIEM/SOAR; human‑override UX.
  • Robotics and physical process control with layered safeguards (robotics/industry)
    • What: Port the deny‑first, multi‑layer safety stack and isolated subagents to physical task planning and execution (e.g., warehouse or maintenance bots).
    • Dependencies: Verified low‑level controllers; strict sandbox equivalents (simulation/digital twins); real‑time override; liability frameworks.
  • Classroom and workforce upskilling against “capability atrophy” (education/policy)
    • What: Curricula and tools that surface agent reasoning, require human evaluation, and track comprehension to mitigate long‑term skill erosion.
    • Dependencies: Instrumentation that couples agent output with assessment; longitudinal studies; incentives for deliberate practice.
  • Cross‑channel personal assistant gateways (consumer productivity)
    • What: Gateway architectures (cf. OpenClaw contrast) that embed the loop within a control plane, register capabilities globally, and enforce perimeter‑level access for email, calendars, and files.
    • Dependencies: Unified capability registry; identity and consent flows; device and app adapters; privacy‑preserving data handling.
  • Live observability and evaluation tooling for agents (software/ML ops)
    • What: Tools to visualize compaction, permission decisions, and subagent orchestration; attach evaluators that score safety and quality in real time.
    • Dependencies: Streaming telemetry standards; privacy/redaction; robust evaluator models; organizational processes to act on signals.

Glossary

  • agent loop: The iterative control cycle that assembles context, calls the model, routes tool requests through safety checks, executes approved tools, and repeats. "All entry surfaces converge on the same agent loop."
  • agentic coding tool: A coding assistant that autonomously plans and executes actions (e.g., running commands, editing files) to achieve user goals. "Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user."
  • Agent SDK: A programmatic interface for integrating and controlling the agent from external applications or services. "The Agent SDK emits typed events via async generators."
  • append-oriented session storage: A persistence model that records session history by appending entries rather than mutating past state, aiding auditability and recovery. "append-oriented session storage"
  • auto-mode ML classifier: A machine-learning component that evaluates the safety of tool-use requests when automatic approvals are enabled. "auto-mode ML classifier"
  • budget reduction: A compaction strategy that caps or trims oversized tool outputs before model calls to manage context pressure. "Budget reduction targets individual tool outputs that overflow size limits."
  • CLAUDE.md: A user-visible, file-based configuration and memory system that provides hierarchical, version-controllable instructions for the agent. "The CLAUDE.md + memory subsystem provides a four-level instruction hierarchy"
  • compaction pipeline: A multi-stage sequence of context-reduction techniques applied before each model call to fit within the context window. "a five-layer compaction pipeline for context management"
  • context collapse: A compaction technique that summarizes long histories into shorter representations to preserve essential information. "Context collapse manages very long histories."
  • context window: The maximum token budget the model can attend to in a single call, treated as the binding resource constraint. "the context window (200K for older models, 1M for the Claude 4.6 series)"
  • defense in depth: A safety posture that uses multiple independent, overlapping mechanisms so any single layer can block unsafe actions. "defense in depth with layered mechanisms"
  • deny-first rule evaluation: A permission policy in which deny rules override ask and allow rules, with unrecognized actions escalated to humans. "Deny-first rule evaluation (#1{permissions.ts})"
  • externalized programmable policy: Safety and permission logic expressed via external configuration and lifecycle hooks rather than being hardcoded, enabling flexible governance. "externalized programmable policy"
  • gateway control plane: The centralized management layer of a gateway system into which an agent runtime can be embedded. "embedded runtime within a gateway control plane"
  • graduated trust spectrum: A permission model that offers multiple levels of autonomy and oversight, allowing users to move along a continuum of trust. "Graduated trust spectrum"
  • hook-based interception: A mechanism where lifecycle hooks can observe, modify, or block actions (e.g., tool invocations) before they execute. "Hook-based interception (#1{types/hooks.ts})"
  • isolated subagent boundaries: Architectural boundaries ensuring subagents run with separate context and permissions, limiting blast radius and context bloat. "isolated subagent boundaries"
  • MCP: A protocol-based extensibility mechanism that exposes tools and services to the agent via server connections. "MCP server connections"
  • microcompact: A compaction stage that reduces overhead (e.g., cache-related) by aggressively tightening message content before a model call. "Microcompact reacts to cache overhead."
  • minimal scaffolding with maximal operational harness: A design principle favoring a simple control loop and strong execution infrastructure over complex orchestration logic. "minimal scaffolding with maximal operational harness"
  • orchestrator-workers pattern: An agent workflow where a central orchestrator delegates tasks to worker subagents and aggregates results. "primarily uses the orchestrator-workers pattern for subagent delegation"
  • perimeter-level access control: A security approach that enforces access policies at the system boundary rather than via per-action approvals. "perimeter-level access control"
  • principal hierarchy: An authority structure that specifies who has ultimate control, ordered from organization to operators to end users. "principal hierarchy (Anthropic, then operators, then users)"
  • ReAct pattern: An interaction pattern where the model alternates between reasoning steps and tool-using actions, with results feeding back into subsequent reasoning. "Claude Code's reactive loop follows the ReAct pattern"
  • shell sandboxing: Executing shell commands in a restricted environment that limits filesystem and network access independent of permissions. "Shell sandboxing (#1{shouldUseSandbox.ts})"
  • sidechain transcripts: Separate, append-only logs for subagents’ conversations that keep their content out of the parent agent’s context. "Sidechain transcripts (#1{sessionStorage.ts:247})"
  • sibling abort controller: A coordination mechanism that cancels concurrent tool executions when one fails, preventing wasteful or unsafe continuation. "Sibling abort controller."
  • StreamingToolExecutor: A streaming execution component that begins running tools as soon as tool-use blocks arrive, reducing latency. "The primary path uses #1{StreamingToolExecutor}, which begins executing tools as they stream in from the model response"
  • subagent delegation: The practice of spawning child agents to handle subtasks, often returning summaries to conserve context. "a subagent delegation and orchestration mechanism"
  • tool_use blocks: Structured messages emitted by the model to request specific tool invocations, which the harness validates and executes. "tool_use blocks"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 27 tweets with 649 likes about this paper.