Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents

Published 11 Apr 2026 in cs.SE and cs.AI | (2604.13108v1)

Abstract: AI coding agents spend a substantial fraction of their tool calls on undirected codebase exploration. We investigate whether providing agents with formal architecture descriptors can reduce this navigational overhead. We present three complementary studies. First, a controlled experiment (24 code localization tasks x 4 conditions, Claude Sonnet 4.6, temperature=0) demonstrates that architecture context reduces navigation steps by 33-44% (Wilcoxon p=0.009, Cohen's d=0.92), with no significant format difference detected across S-expression, JSON, YAML, and Markdown. Second, an artifact-vs-process experiment (15 tasks x 3 conditions) demonstrates that an automatically generated descriptor achieves 100% accuracy versus 80% blind (p=0.002, d=1.04), proving direct navigational value independent of developer self-clarification. Third, an observational field study across 7,012 Claude Code sessions shows 52% reduction in agent behavioral variance. A writer-side experiment (96 generation runs, 96 error injections) reveals critical failure mode differences: JSON fails atomically, YAML silently corrupts 50% of errors, S-expressions detect all structural completeness errors. We propose intent.lisp, an S-expression architecture descriptor, and open-source the Forge toolkit.

Abstract PDF Upgrade to Chat

Authors (1)

Ruoqi Jin

Summary

The paper introduces formal architecture descriptors to address the Navigation Paradox by reducing undirected exploration in code navigation tasks.
Experimentation shows that auto-generated descriptors deliver 100% task completion accuracy and reduce navigation effort by 33–44% without manual intervention.
Real-world deployments revealed a 52% reduction in behavioral variance, emphasizing improved error resilience and scalable integration in AI coding environments.

Introduction

The paper "Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents" (2604.13108) addresses inefficiencies encountered by LLM-based autonomous code agents—such as Claude Code and Copilot Workspace—when tasked with navigating and editing real-world codebases. Specifically, these agents expend a considerable portion of their tool usage on undirected exploration to reconstruct architectural context, a phenomenon labeled the Navigation Paradox. The authors propose to alleviate this overhead through the use of formal, machine-parseable architecture descriptors, instantiated here as intent.lisp, a tree-structured S-expression format encompassing module boundaries, signatures, design constraints, and data flow annotations.

Methodology

The authors conducted three principal empirical investigations:

Controlled Contextualization Study: Evaluation of navigation efficiency on localization tasks with architecture context versus a blind baseline, with equivalent descriptors rendered in S-expression, JSON, YAML, and Markdown.
Artifact-vs-Process Experiment: Disentanglement of descriptor artifact value from human clarification benefits, via testing the efficacy of auto-generated descriptors with no manual refinement.
Field Deployment Observation: Longitudinal analysis of real-world agent sessions to quantify behavioral variance pre- and post-adoption of formal descriptors.

The intent.lisp format features a constrained hierarchical syntax suitable for automated parsing and efficient compression, designed to facilitate both LLM generation and structured tool consumption.

Key Findings

The controlled study demonstrated that providing agents with architectural context in any evaluated format led to a substantial reduction in navigation steps, with a 33–44% decrease in average navigation effort (Wilcoxon $p = 0.009$ , Cohen's $d = 0.92$ ). No statistically significant accuracy or comprehension differences were found between the S-expression, JSON, YAML, and Markdown representations, suggesting model-level agnosticism regarding input syntax when descriptors are correctly structured.

A pivotal experiment established that the direct value of formal descriptors is not contingent upon developer-driven reorganization or refinement. An auto-generated descriptor exhibited 100% task completion accuracy versus 80% for blind navigation ( $p = 0.002$ , $d = 1.04$ ), despite zero manual intervention or codebase restructuring. This result explicitly contradicts the hypothesis that the principal benefit of such artifacts arises from developer clarification rather than artifact consumption per se.

Behavioral Variance and Real-World Deployment

Analysis of 7,012 code agent sessions revealed a 52% reduction in the interquartile range of explore/edit ratios following the introduction of formal descriptors, signaling improved agent predictability and bounded worst-case behaviors. However, within post-adoption periods, differentiated consumption of the descriptor did not yield further reductions, indicating supplementary value from developer-side architectural formalization (the self-clarification effect).

Descriptor Robustness and Compression

Generation and error-resilience benchmarks highlighted nontrivial trade-offs for descriptor format selection:

JSON: 100% syntactic reliability but atomic failure on errors with lower silent corruption (21%).
S-expressions: High resilience to structural errors; errors are detectable and affect only subtrees without total data loss; achieves superior compression (up to 64:1, mean 34:1).
YAML/Markdown: Prone to silent corruption and unstructured failure, rendering them unsuited for governance-centric automation.

No single format unconditionally dominates, but the syntactic discipline and structural error resilience of S-expressions position them as preferable for scaling architectural context transfer.

Implications and Theoretical Impact

This research delivers conclusive evidence that formal, structured architecture descriptors offer practical navigation benefit and robustness for LLM-based coding agents, especially as codebase complexity grows. The results also emphasize that format selection should prioritize structural error detection and compression, not subjective LLM preference. The S-expression-based intent.lisp overcomes the limitations of ad hoc Markdown files and outperforms YAML on both automation and reliability metrics.

Practically, artifact generation can be fully automated and productively consumed without the need for codebase adaptation, greatly lowering deployment friction. These findings portend robust, scalable architectures for AI-led multi-agent development, enabling constant-cost architectural access even as codebase size increases.

Theoretically, the results refute context format optimization as a viable direction for LLM prompt engineering, redirecting focus toward artifact resilience, error recoverability, and system integration with downstream tooling. They also motivate future work in multi-agent system partitioning over tree-structured architectural graphs—a capability that the current results identify as enabled but as yet unevaluated.

Future Directions

The paper's results advocate for:

Systematic evaluation on much larger codebases to validate scaling behavior;
Exploration of multi-agent architectural partitioning leveraging hierarchical descriptors;
Enhanced integration between artifact generation tools and LLM structured output APIs, particularly for S-expression support;
Application of formal architecture descriptors in automated governance and coordination tasks beyond code localization.

Conclusion

The introduction of formal architecture descriptors, specifically through a parsimonious S-expression interface, yields quantifiable navigation improvements and behavioral regularity in AI coding agents, independent of embellishments or codebase alteration. Core advantages accrue in error resilience, structural integrity, and scalable compression rather than LLM interpretative preference. This work reframes best practices in context provisioning for code agents toward formalization, robustness, and automated governance, providing empirical grounding for future systems in scalable AI-driven software engineering (2604.13108).

Markdown Report Issue