Papers
Topics
Authors
Recent
2000 character limit reached

File-Native Agentic Systems

Updated 8 February 2026
  • File-native agentic systems are defined frameworks that use file-based context and native operations to enable scalable and verifiable agentic reasoning.
  • They utilize explicit file-system APIs and persistent audit logs, ensuring modularity, reproducibility, and secure multi-agent coordination.
  • Inspired by the Unix philosophy, these systems optimize context structuring and dynamic branching to support robust applications in software engineering and data science.

A file-native agentic system is a large-language-model-driven agentic pipeline that manipulates and reasons over native file-system abstractions and file-based context, treating files as both context substrate and operational interface. Unlike monolithic prompt-based approaches, file-native architectures afford persistent, searchable, and composable context handling, enabling scalable memory, workflow auditability, and operational robustness across domains such as structured data querying, software engineering, and collaborative data science. Architectures span from autonomous single agents to collaborative multi-agent systems, leveraging explicit file-based APIs and formal context pipelines for scalable and verifiable agentic reasoning.

1. Conceptual Foundations and Formal Abstractions

File-native agentic systems generalize the longstanding Unix philosophy that “everything is a file” by treating all resources—structured schemas, logs, models, tools, and even external APIs—as files mounted in a unified namespace or agentic file system (AFS) (Piskala, 16 Jan 2026, Xu et al., 5 Dec 2025). Formally, resources are projected via a mount function μ:RN\mu : R \rightarrow N, mapping resources RR into a rooted file namespace NN. Each file node carries metadata m:NMm: N \rightarrow M (with fields for provenance, ACLs, timestamps), and access is mediated by standardized interfaces (read, write, exec, list), generating persistently logged transactions. File-nativity is characterized by:

  • Direct use of file-system APIs as the primary tool interface (T{createFile,readFile,...}\mathcal{T} \equiv \{\mathrm{createFile}, \mathrm{readFile}, ...\}), with agent state as a snapshot of the file/directory tree (V et al., 18 Jan 2026).
  • All context assembly, storage, and transformation occurs by explicit file operations, with full audit logs under /context/history/ or analogous namespaces.
  • Multi-agent protocols and composition are realized by exchanging file-based artifacts (plans, memory fragments, intermediate reasoning logs), facilitating traceability and modularity.

This approach subsumes both the tool-use capabilities enabled by the Model Context Protocol (MCP) (V et al., 18 Jan 2026) and the persistent, versioned context infrastructure advocated by the AIGNE framework (Xu et al., 5 Dec 2025).

2. Architectural Patterns and Agent Taxonomies

File-native agentic systems are classified by their workflow architecture and degree of specialization:

  • Single-Loop Agents: Simple sequential pipelines operating over a handful of files, well-suited for CRUD or simple transformation tasks (V et al., 18 Jan 2026).
  • Hierarchical/Modular Agents: Root agents decompose global workflows (e.g., repo-wide code refactoring) into per-file subtasks for parallel, granular processing (V et al., 18 Jan 2026).
  • Collaborative Multi-Agent Systems: Chains, star topologies, or mesh networks of agents with clearly typed I/O contracts (e.g., metadata analyzers, planners, code generators, and evaluators in CoDA) (Chen et al., 3 Oct 2025). Each agent consumes and produces file-based artifacts and metadata, supporting integrated multi-step reasoning and iterative refinement.
  • Context Engineering Pipelines: Explicit Context Constructor/Loader/Evaluator components for selection and streaming of context fragments under token budgets and relevance, with manifest files encoding provenance and rationales (Xu et al., 5 Dec 2025).
  • Sandboxed Agentic Environments: LLMs interact with a secured, containerized file system, enabling general reasoning, persistent memory, and dynamic resource acquisition via native file and script operations (Cheng et al., 22 Jan 2026).

All of these architectural families leverage file-based abstraction for modular context, isolation, reproducibility, and adaptation to large-scale, heterogeneously structured resources.

3. Methods for Context Structuring and Persistent Memory

File-native systems unify context management at scale by leveraging structured, persistent file-based artifacts:

  • Formats and Schemas: Systems engineer context files in YAML, Markdown, JSON, or domain-compact notations (e.g., TOON), balancing grep/search efficiency, clarity, and machine parseability. Navigational manifests (e.g., navigator.md) provide summary metadata and disambiguation paths (McMillan, 5 Feb 2026).
  • Metadata Abstraction: For scale, LLM agents are fed only metadata summary vectors Mf=(nrows,{(coli,τi,μi,σi)},sample)M_f = (n_{\text{rows}}, \{(\text{col}_i, \tau_i, \mu_i, \sigma_i)\}, \text{sample}) for each file ff, not full raw data, enabling token budget control for hundreds of files (Chen et al., 3 Oct 2025).
  • Persistent Audit and Memory: All agentic actions produce or mutate files (logs, memory snapshots, code artifacts). Specialized memory types—scratchpads, episodic/session memory, fact memory, procedure/user memory—are stored in versioned directories under /context/memory/{agentID}/, governed by explicit policies and logged for replay and verification (Xu et al., 5 Dec 2025).
  • File-Based Tool Use: Agents manipulate files via standardized or MCP-exposed APIs (readFile, writeFile, deleteFile), as formalized in the agent’s tool space (V et al., 18 Jan 2026, Xu et al., 5 Dec 2025).

The net effect is orchestration of scalable, pipeline-parsimonious, and fully auditable reasoning via composition of small, addressable file artifacts.

4. System-Level Issues: Branching, Scalability, and Efficiency

Scalability and interactive exploration in file-native agentic workflows necessitate dedicated system support beyond generic checkpoint/restore. Key dimensions include:

  • Snapshot and Restore Semantics: Traditional container and VM checkpointing (CRIU, Docker commit, Podman) incur linear latencies (e.g., TCRIU(S)0.69s/GiBT_{\text{CRIU}}(S) \approx 0.69 s/\mathrm{GiB} of "dirty" memory) and fail in multi-tenant or side-effectful deployments (e.g., shared sockets, cloud APIs) (Xu et al., 7 Oct 2025).
  • Copy-on-Write Branching: Native branching is realized via per-branch overlay deltas, with visibility FSb=FS0δbFS_b = FS_0 \oplus \delta_b and copy-on-write semantics for files, in-memory pages, and runtime heaps, enabling microsecond branching for high-fan-out exploration (Xu et al., 7 Oct 2025).
  • External Side-Effects: Fork-aware services (e.g., versioned S3 object stores) or interception proxies are necessary for correct side-effect management beyond the file system (Xu et al., 7 Oct 2025).
  • Context Partitioning and Navigation: For structured queries over schemas, domain-partitioned file sets and explicit navigation yield >90% accuracy at 10,000-table scale, provided per-query context size is bounded and navigation aids are clear (McMillan, 5 Feb 2026).
  • Runtime Efficiency: File operations may dominate token consumption if formats are not optimized for the agent's search patterns; compact, unfamiliar formats can yield orders-of-magnitude "grep tax" under large schemas (McMillan, 5 Feb 2026).

Collectively, these system-intimate optimizations are prerequisites for low-latency, massively parallel agentic workflows over persistent file contexts.

5. Evaluation Practices, Metrics, and Best Practices

Evaluation of file-native agentic systems is grounded in both operational metrics and application-specific accuracy:

  • Task-Centric Metrics:
    • Accuracy/jaccard similarity of agent outputs to gold standards (e.g., J(A,G)=AGAGJ(A, G) = \frac{|A \cap G|}{|A \cup G|}, success if J0.9J \geq 0.9) (McMillan, 5 Feb 2026).
    • Number of successful file operations, error rate, and hallucination rate (attempts to access nonsensical or unsafe paths) (V et al., 18 Jan 2026).
    • Composite scoring: α(successfulOps/totalOps)βerrorRate\alpha \cdot (\mathrm{successfulOps} / \mathrm{totalOps}) - \beta \cdot \mathrm{errorRate}.
  • Context Engineering Metrics:
    • Token and runtime costs for different schema/data formats and navigation architectures.
    • Navigation accuracy and scalability of multi-file partitioning.
  • Audit and Traceability: All read/write/exec actions are logged as immutable history, supporting end-to-end replay, provenance tracking, and human-in-the-loop audit (Xu et al., 5 Dec 2025).
  • Qualitative Dimensions: Maintainability (ease of extension), operational robustness (sandboxing, permission control), and auditability (transaction log completeness).
  • Best Practices:
    • Choose architecture and format according to model capability; optimize for token efficiency, human readability, and grep-friendliness case-by-case (McMillan, 5 Feb 2026).
    • Routinely partition schemas or file context for tasks with >500 files/tables.
    • Employ context constructors/loaders for explicit, justifiable context selection under token constraints.
    • Route intermediate reasoning and plans through persistent file logs for review and iterative refinement.
    • Summarize and compress memory buffers beyond fixed thresholds to remain within context window sizes.

6. Application Domains and Empirical Case Studies

File-native agentic systems are instantiated across domains:

  • Software Engineering: Change-aware file-level software defect prediction leverages a multi-agent, file-native debate framework operating over code snapshots and deltas, rather than static snapshots, correcting label persistence bias and increasing sensitivity to defect transitions (Hesamolhokama et al., 29 Dec 2025).
  • Structured Data Querying: SQL-generation agents benefiting from file-based context retrieval, with empirical evidence showing that model tier and retrieval architecture interact strongly and must be tuned empirically (McMillan, 5 Feb 2026).
  • Collaborative Data Science: CoDA employs a multi-agent file-native workflow for collaborative data visualization, orchestrating agents for metadata analysis, task planning, code generation, debugging, and iterative quality evaluation over multi-file datasets (Chen et al., 3 Oct 2025).
  • General Agentic Intelligence: LLM-in-Sandbox enables LLM agents to exploit a virtual file system for long-context reasoning, dynamic resource acquisition, and script-based solution construction, yielding quantitative gains in STEM domains and compression of prompt token budgets by up to 8x (Cheng et al., 22 Jan 2026).
  • Context Engineering: The AIGNE framework formally systematizes file-system-based context engineering, ensuring that all context assembly, delivery, and evaluation is persistent, governed, and transparent to both human and machine auditors (Xu et al., 5 Dec 2025).

These diverse applications demonstrate that file-native designs not only scale effectively in practice but also fundamentally improve auditability, maintainability, and composability compared to ephemeral, non-persistent, or monolithic prompt-based paradigms.

7. Open Challenges and Future Directions

Several critical challenges and future work directions remain in the file-native agentic paradigm:

  • Semantic Branching and Merging: Robust, performant support for forking and merging speculative branches with non-file side effects is unsolved at OS and API levels (Xu et al., 7 Oct 2025).
  • Security and Hallucination Containment: Prevention of prompt injection through files, stringent ACLs, and formal pre-condition validation are necessitated by the risk of over-permissive file operations (V et al., 18 Jan 2026).
  • Formal Verification: Synthesis, type-safe file APIs, and pre/postcondition enforcement are open research vectors for guaranteed semantic correctness.
  • Multi-Agent Coordination at Scale: Lock management, race condition avoidance, and human/AI curation protocols for collaborative editing scenarios require principled mechanisms rooted in file-level events and metadata.
  • Expressive, Extensible Context Abstractions: Research is ongoing into context engineering frameworks balancing compression, relevance, provenance, and token budget constraints for increasingly heterogeneous, dynamic agentic environments.

These active areas signal that while file-native agentic systems have established clear empirical and architectural advantages, they also drive new systems, security, and composability research for the next era of AI agent infrastructure.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to File-Native Agentic Systems.