Agent Foundation Models (AFMs)

Updated 2 July 2026

Agent Foundation Models are systems that combine large-scale pre-trained models with agentic modules for dynamic memory, planning, and external tool invocation.
They utilize a multi-layer architecture to enable robust multi-agent coordination and scalable, adaptive service deployment across heterogeneous environments.
Current research focuses on optimizing training paradigms, enhancing real-time inference, and addressing challenges in efficiency, personalization, and secure multi-agent collaboration.

Agent Foundation Models (AFMs) are a class of systems that integrate large pre-trained foundation models (FMs)—such as LLMs, vision-LLMs, and multi-modal FMs—with agentic architectures. These architectures equip the system with dynamic memory, planning, external tool invocation, and multi-agent coordination capabilities to deliver real-time, personalized, and goal-driven behavior. This approach aims to establish the computational substrate for scalable, adaptive, and robust artificial intelligence agents, bridging the gap between static FM inference and autonomous, context-aware agent services (Xu et al., 2024).

1. Foundations and Core Principles

AFMs unify and extend pure FM inference by embedding agentic modules for memory retention, high-level task decomposition, external environment interaction, and multi-agent orchestration. Unlike traditional LLMs or VLMs limited to question-answering or next-token prediction, AFMs operate within a sense–plan–act feedback loop, enabling explicit long- and short-term memory management, task planning, API/tool usage for real-world interaction, and robust response under resource and distributional constraints (Xu et al., 2024, Chen et al., 13 Oct 2025, Hu et al., 9 Dec 2025).

Key distinctions:

Dynamic Memory: AFMs retain both episodic and working memory, often coupling vector DBs or memory caches with context retrieval strategies.
Planning: High-level user requests are decomposed into subtasks or action plans, either via internal chain-of-thought reasoning or explicit planning modules.
Tool Use: Direct calls to APIs, web search, run-code, or other tools are executed as part of agentic trajectories, facilitating up-to-date and specialized computation.
Multi-Agent Coordination: Systems are increasingly endowed with mechanisms for agent-to-agent communication, negotiation, and collaborative adaptation beyond single-agent policy optimization (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025).

This generalization has shifted the operational substrate for AI agents from static, monolithic FMs to modular, robust, and extensible agent architectures with measurable advances in autonomy, proactivity, and adaptability.

2. Architectures and Deployment Frameworks

The 5-layer reference architecture for AFMs, as described in (Xu et al., 2024), is designed to optimize agent service deployment over heterogeneous hardware (cloud, edge, mobile):

Application Layer: Handles user interface, client APIs, batch scheduling, and QoS enforcement.
Agent Layer: Implements orchestration, planning, memory management, and tool invocation. It is the locus for core agent logic, including subgoal decomposition, context retrieval, and execution routing.
Model Layer: Hosts the compressed/quantized FM(s), including token-efficient variants optimized for throughput and memory footprint.
Resource Layer: Governs parallelism strategies (data, model, pipeline), load balancing, and autoscaling across available compute resources.
Execution Layer: Connects directly to device back-ends (CPU, GPU, FPGA, ASIC, IMC) and is responsible for low-level FM inference optimization, including kernel fusion and I/O scheduling.

System-level scalability is achieved via parameter sharding (bipartite matching to maximize KV-cache reuse), dynamic autoscaling, and elastic parallelism, which collectively enable adaptively tuning resource allocation and batch strategies to meet latency, throughput, and accuracy targets (Xu et al., 2024).

3. Learning Paradigms and Optimization

AFM training and inference draw on multiple paradigms:

Supervised Multi-Agent Distillation: (Li et al., 6 Aug 2025) distills expert trajectories from orchestrated multi-agent systems into unified models; each step is annotated with explicitly activated agent modules or tools, facilitating full end-to-end fine-tuning from coordinated demonstrations.
Adaptive Policy Optimization (APO): (Chen et al., 13 Oct 2025) incorporates mode-routing (instant, reasoning, agentic) and applies RL-based cost-regularized reward shaping to jointly optimize accuracy and efficiency. The objective:

$J_\text{APO}(\theta) = \mathbb{E}_{x \sim D}\left[ \mathbb{E}_{m \sim \pi_\text{route}(\cdot|x)} [ r(m; x) - \lambda \, c(m; x) ] \right]$

where $r(m; x)$ conjoins correctness, output-format compliance, and cost-minimizing mode selection.

Multi-Agent Reinforcement Learning/Self-Play: (Hu et al., 9 Dec 2025) and related efforts highlight co-evolution of agent pools, population-based training, and multi-agent RLHF (with feedback on both outcome and quality of collaboration/negotiation).
Inference Acceleration and Model Compression: Techniques include pruning (weight/head ablation), quantization (e.g., W4A8 on edge devices), student–teacher distillation, and aggressive token reduction via attention-based pruning or summarization, with speedups of up to 20× reported for streaming settings (Xu et al., 2024).

4. Multi-Agent and Hybrid AFM Capabilities

AFMs span both single-agent and multi-agent intelligence:

Route-Then-Align Architectures: (Chen et al., 13 Oct 2025) proposes task-aware routing over instant, reasoning, and agentic modes, aligning their generation trajectories in a single backbone model. Instant mode handles simple queries directly; reasoning mode triggers explicit chain-of-thought; agentic mode invokes tool plans and interleaves tool-calling.
Native Multi-Agent Intelligence: (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025) demonstrate that robust multi-agent abilities—understanding others' beliefs and desires, joint planning, compressed communication, rapid adaptation—require explicit architectural and training modifications, not merely large single-agent scaling.
Swarm and Role-Based Orchestration: Agent collectives are constructed as DAGs (directed acyclic graphs) of diverse models and skill specialists; evolutionary and Lamarckian algorithms optimize topology and communication, while role-specialized prompts maximize diversity and robustness, particularly for tasks requiring intensive reasoning over factual recall (Mamie et al., 7 Mar 2025).

A representative summary table:

Capability	Key Enabler	Empirical Outcome
Dynamic memory	KV-caches + vector DB	Long-horizon adaptation
Multi-agent planning	Joint-policy modules, DAGs	Improved reasoning accuracy
Mode routing	Task-aware router (π_route)	Efficiency, cost savings
Tool use	Plan, API invocation, code exec	Up-to-date computation

5. Performance, Benchmarks, and Real-World Deployments

AFMs are characterized by significant gains in real-world benchmarks and application scenarios:

Benchmarks: State-of-the-art results are reported for web navigation (GAIA: 55.3% pass@1), mathematical reasoning (AIME25: 70.4%), and general knowledge (SuperGPQA: 54.7%) (Chen et al., 13 Oct 2025, Li et al., 6 Aug 2025).
Industrial AFMs: In applied contexts, AFMs deliver +37pp human interaction and +35pp uncertainty handling over conventional agent systems, but demonstrate pronounced negotiation deficits (-39pp) (Henkel et al., 4 May 2026).
Practical Deployments: Highly optimized AFM-based chatbots (FastChat, StreamingLLM) reach latencies as low as 5–50 ms/token and QPS in the hundreds to thousands, with aggressive acceleration and batching strategies (Xu et al., 2024).
Robustness: Runtime calibration methods such as MARGIN (Armstrong, 21 May 2026) are essential for ensuring reliable agent coordination, overcoming systematic miscalibration of self-reported confidences, and restoring trust in agent selection and ensemble voting under distribution shift.

6. Design Patterns, Trade-offs, and Best Practices

Architectural patterns cataloged in (Liu et al., 2024) systematize AFM design along axes such as goal extraction (Passive vs. Proactive Goal Creator), planning style (Single-path vs. Multi-path), coordination (Voting, Role-Based, Debate), and reflection (Self, Cross, Human). Notable trade-offs:

Efficiency vs. Explainability: One-shot query approaches are computationally cheaper but limit auditability; incremental querying and multi-path planning offer transparency at higher cost.
Robustness and Governance: Multimodal guardrails, reflection/voting routines, and tool/agent registries enhance safety, explainability, and extensibility, but incur organizational and maintenance overhead.
Edge/Cloud Strategies: Edge deployments emphasize quantization, token pruning, and small FM variants; cloud deployments leverage hybrid parallelism, monitoring frameworks, and dynamic autoscaling (Xu et al., 2024).

Best practices recommend a modular, memory-augmented, and resource-aware approach, supported by continuous monitoring and failure-tolerant orchestration.

7. Challenges and Future Directions

Several open challenges define the AFM research frontier:

Sub-10 ms/Token Latency for Large FMs: Hardware–algorithm co-design to bridge the real-time response gap for very large models.
Scalable Multi-Agent Coordination: Enabling dynamic agent teaming, trust management, negotiation, and emergent behaviors in open-agent societies.
Personalization and Security: Lifelong learning, privacy-preserving federated optimization, and adversarial robustness in tool-use.
Semantic Communication: Semantic-aware compression, retrieval-augmented generation (RAG), and on-device adaptation to minimize bandwidth and maximize privacy.
Formal Safety and Governance: Standardized evaluation, transparency protocols, and safety/robustness frameworks for high-stakes deployments (Xu et al., 2024, Hu et al., 9 Dec 2025, Henkel et al., 4 May 2026).

Integrating these advancements, AFMs are poised to become critical infrastructure for diverse domains demanding robust, explainable, and adaptive agentic intelligence—ranging from autonomous vehicles and collaborative robotics to decision support and real-time industrial monitoring.