The Ann Arbor Architecture for Agent-Oriented Programming (2502.09903v1)

Published 14 Feb 2025 in cs.AI, cs.HC, and cs.SE

Abstract: In this paper, we reexamine prompt engineering for LLMs through the lens of automata theory. We argue that LLMs function as automata and, like all automata, should be programmed in the languages they accept, a unified collection of all natural and formal languages. Therefore, traditional software engineering practices--conditioned on the clear separation of programming languages and natural languages--must be rethought. We introduce the Ann Arbor Architecture, a conceptual framework for agent-oriented programming of LLMs, as a higher-level abstraction over raw token generation, and provide a new perspective on in-context learning. Based on this framework, we present the design of our agent platform Postline, and report on our initial experiments in agent training.

Summary

The paper proposes the Ann Arbor Architecture (AAA), a framework programming LLM agents using automata theory, viewing prompt engineering as primary and unifying natural and formal languages.
The AAA emphasizes a continuous development process through asynchronous message exchange recorded in an immutable journal, with dynamic context management via Memory Segment Rewrite.
The Postline prototype and experiments, notably the failure of static initialization from summaries, suggest interaction history is crucial for robust agent behavior, more so than static knowledge.

The paper "The Ann Arbor Architecture for Agent-Oriented Programming" (2502.09903) proposes a reconceptualization of programming LLMs by viewing them through the lens of automata theory. It argues that LLMs, as automata capable of processing both natural and formal languages, necessitate a departure from traditional software engineering paradigms that strictly separate these language modalities. The paper introduces the Ann Arbor Architecture (AAA), a conceptual framework for agent-oriented programming built upon this premise, and details Postline, a prototype platform implementing the AAA.

Automata Theory Perspective on LLM Programming

The foundational argument posits that LLMs function akin to automata: they process sequential input tokens, update internal states, and generate output tokens. Following the principle that automata are programmed in the languages they accept, the paper contends that LLMs should be programmed using the unified set of all languages they understand – encompassing both natural languages (e.g., English) and formal languages (e.g., Python, JSON, mathematical notation). This perspective reframes prompt engineering as the fundamental programming activity for LLMs. It critiques conventional software engineering practices, which maintain a rigid distinction between human-readable natural language (documentation, specifications) and machine-executable formal language (code), arguing that this dichotomy hinders the effective utilization of LLMs' unique capabilities. The paper advocates for embracing the inherent language unification within LLMs to develop more powerful and flexible agent systems.

The Ann Arbor Architecture (AAA)

The AAA is presented as a high-level conceptual framework for agent-oriented programming, moving beyond specific task implementations or simple prompt chaining. It emphasizes continuous interaction, persistent memory, and agent evolution. Key tenets include:

Interaction Model: Inspired by the email system, agents possess unique identifiers and interact solely through asynchronous message exchange. An agent's behavior is determined by its complete interaction history.
Persistent Memory (Journal): An agent's primary memory is its journal, an append-only log of all messages sent and received, conceptually structured using the MBox format. This ensures the agent's entire history informs its subsequent actions.
Unified Process: The distinction between development, training, testing, and deployment is dissolved. All phases are part of a continuous conversational process recorded in the journal. An agent might be trained via simulated interactions and then handle real tasks within the same framework.
Tool Integration (Robots): Agents can interact with external tools and environments (terminals, browsers, APIs) via robots – non-intelligent adapters, each potentially addressable, that translate between the agent's message format and the tool's interface.
LLMs as Utilities: LLMs are treated as interchangeable computational resources, potentially selected dynamically based on task requirements, cost, or capability.
Memory Management and Evolution: Recognizing the potential scale of journals, the AAA incorporates mechanisms for memory management. The Memory Segment Rewrite (MSR) primitive allows agents to modify their own effective history (context) by issuing specific instructions. Agents can also clone themselves (creating identical copies with shared history up to the point of cloning) or split their journals to create specialized descendant agents, enabling evolutionary development paradigms.
In-Context Learning as Memory Episodes: The paper reinterprets in-context learning not as static few-shot examples but as dynamic, multi-turn "memory episodes" within the journal. These episodes capture the process of task execution, including errors, corrections, explanations, and final solutions, simulating a more robust learning process through interaction history.

Postline Platform Design

Postline serves as a proof-of-concept implementation of the AAA. Its design choices reflect the architectural principles:

System Architecture: Postline is designed as a cloud-native system utilizing components like Kafka for persistent journaling, MinIO/S3 for context storage, stateful field servers for processing agent interactions, and lock servers for managing concurrent access to agent contexts.
Journal vs. Context: A crucial distinction is made between the journal (the immutable, append-only Kafka log of all messages, including MSR commands) and the context (the derived representation of an agent's history sent to the LLM, reflecting the effects of MSR operations). The context is stored in memory or on disk (MinIO/S3).
Memory Segment Rewrite (MSR) Implementation: MSR is implemented via messages sent to a special system@localdomain address with a subject like "MSR: XXX-YYY", where XXX and YYY are X-Serial identifiers referencing message ranges in the context. The field server processes this by replacing the specified segment in the context with the body of the MSR message, while the MSR message itself is simply appended to the immutable journal.
Agent Management: Agents are implicitly created when a message is first sent to an address within a designated agent namespace (e.g., agents.localdomain). Cloning is supported through hierarchical naming conventions (e.g., sending a message to [email protected] initiates a clone of [email protected]).
Scalability (Worlds and Realms): Postline uses worlds as isolated namespaces and partitions computation within a world using realms. Each field is managed by a field server (an OS process). Context locking ensures that an agent's state is actively processed by only one field server at a time.
Metadata: Extended email-style headers (X-Serial, X-Total-Tokens, X-Hint-Model, X-Realm) are used to manage context and interaction metadata.
Privacy: The paper acknowledges the need for privacy and information control, suggesting future LLM APIs could support encrypted interactions, enabling agents to manage their own information sovereignty.

Experimental Findings

Initial experiments using Postline with GPT-4o were conducted to validate core AAA concepts:

Tool Use: An agent successfully interacted with a shell robot, executing commands (lsblk) and processing the JSON-formatted output.
Code Generation and Execution: The agent demonstrated the ability to write, execute via the shell robot, and iteratively debug simple Python and C++ programs. It also modified Postline's own source code and generated Git commits.
Non-Text Data Handling: The agent managed binary data by using shell tools (ImageMagick, base64) to generate an image and then correctly constructing an MBox message with a multipart/mixed structure and a base64-encoded image attachment, albeit requiring some corrective interaction.
Agent Creation and Cloning: Dynamic agent creation and interaction between parent and child agents were demonstrated. Issues with resource consumption during recursive cloning highlighted the need for control mechanisms.
Memory Segment Rewrite (MSR): An agent successfully identified and used MSR to remove a redundant segment from its context related to the image generation task, verified by a decrease in X-Total-Tokens.
Failure of Static Initialization ("Bible"): A significant negative result was reported when attempting to initialize new agents using a condensed, static summary ("Bible") derived from a successful agent's journal. Agents initialized this way failed to replicate the interaction protocols and learned behaviors (e.g., correct JSON formatting for the shell robot) exhibited by the original agent. This failure underscores the AAA's emphasis on the process captured within the full interaction history (journal) and dynamic learning through memory episodes, arguing against the sufficiency of static knowledge dumps for initializing complex agent behaviors.

Rethinking Software Engineering Practices

The AAA framework and the Postline experiments suggest several shifts from traditional software engineering:

Language Unification: The boundary between formal code and natural language documentation/communication becomes porous. The primary artifact potentially shifts from distinct code/doc files to the agent's journal, from which various representations can be generated.
Continuous Interaction: The traditional waterfall or agile cycles involving distinct specification, development, testing, and deployment phases are replaced by a continuous loop of interaction, refinement, and evolution captured within the agent's journal.
Flexibility over Rigid Structures: The framework critiques the rigidity of predefined computational graphs or state machines common in other agent platforms, arguing they are artifacts of older paradigms unsuited to the fluid nature of LLM interactions. AAA prioritizes the dynamic history over static structure.
Primacy of Interaction History: The "Bible" experiment highlights that the process of interaction, including errors and corrections recorded in the journal, is critical for robust agent behavior, more so than a static summary of knowledge or capabilities. Refactoring or summarizing memory, even if semantically equivalent to humans, can detrimentally affect LLM performance due to the nature of attention mechanisms and sequential processing.

In conclusion, the Ann Arbor Architecture proposes an automata-theoretic foundation for programming LLM-based agents, emphasizing persistent, append-only memory (journaling), interaction as the sole mechanism, and a continuous development process. The Postline prototype and initial experiments, particularly the failure of static initialization, provide preliminary support for the framework's focus on dynamic interaction history over static knowledge representations and rigid predefined structures.