Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agent Foundation Model

Updated 10 February 2026
  • Agent Foundation Model (AFM) is a unified, parameterized architecture that integrates perception, planning, and multi-agent coordination for autonomous operations.
  • It employs closed-loop workflows with recursive state updates and adaptive routing to efficiently handle multi-modal inputs and complex decision-making.
  • Training techniques like imitation and reinforcement learning ensure robust performance, interpretability, and safe adaptation across various application areas.

An Agent Foundation Model (AFM) is a unified, parameterized model—typically a large transformer or a multi-module neural network—explicitly architected and trained to act as an autonomous agent or collective of agents. Unlike monolithic foundation models passively mapping inputs to outputs, AFMs deliberately integrate action selection, environment interaction, sequential decision-making, multi-agent coordination, and reasoning scaffolding within their core architecture. AFMs support perception, planning, communication, memory, meta-reasoning, and tool invocation as intrinsic first-class functions, allowing them to pursue goals adaptively across diverse domains ranging from web automation and robotics to computational pathology and research assistance (Sun et al., 26 May 2025, Hu et al., 9 Dec 2025, Li et al., 6 Aug 2025, Chen et al., 13 Oct 2025).

1. Core Definitions and Architectural Principles

Formally, an AFM can be represented as a neural agent Mθ\mathcal{M}_\theta with internal state sts_t, a set of possible actions AA, and a learned policy πθ(ast)\pi_\theta(a|s_t). The state sts_t typically encapsulates raw observations, history, memory, and agent-specific embeddings. Unlike traditional single-turn models, AFMs recursively update sts_t by processing observations oto_t, choosing actions ata_t, and updating internal memory, thus enabling closed-loop, sequential decision-making (Sun et al., 26 May 2025, Li et al., 6 Aug 2025).

AFMs are characterized by several unifying traits:

  • Agentic Loop: AFMs model explicit perception–action–reasoning cycles (e.g., vision→act→reason→move in pathology; tool calls→reflection→plan selection in web agents).
  • Multi-Modal Integration: Many AFMs natively fuse text, image, audio, video, and action modalities, achieved via shared transformers, cross-modal attention modules, or multi-head architectures (Durante et al., 2024).
  • Multi-Agent & Modular Decomposition: Advanced AFMs instantiate multiple agent roles (e.g., planner, assigner, validator) or orchestrate swarms of agents with dynamically optimized communication topologies (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025, Sun et al., 30 Nov 2025).
  • Intrinsic Planning, Reflection, and Tool Use: AFMs unify internal reasoning (chain-of-thought, tree-of-thought) with external tool invocation, critique, and human involvement under a shared memory and policy (Fang et al., 1 Aug 2025, Liu et al., 2024).
  • Unified Policy and Value Functions: Many are trained via imitation learning, multi-objective supervised fine-tuning, and RL on structured tasks that encompass both acting and reasoning (Sun et al., 26 May 2025, Chen et al., 13 Oct 2025).

2. Taxonomy, Functional Scope, and Design Patterns

AFMs span a broad taxonomy defined by architectural, functional, and non-functional criteria (Zhou et al., 2024, Liu et al., 2024):

Taxonomy Axis Example Options (not exhaustive) Explained in
Input Modality Text, vision, audio, multi-modal (Durante et al., 2024, Zhou et al., 2024)
Model Composition Single backbone, mixture-of-experts, (Hu et al., 9 Dec 2025, Liu et al., 2024)
ensemble, hybrid
Planning & Reasoning Chain-of-thought, tree-of-thought, (Li et al., 6 Aug 2025, Sun et al., 26 May 2025, Liu et al., 2024)
centralized vs. distributed planning
Cooperation Mechanisms Voting, role-based, debate, cross-ref. (Liu et al., 2024, Sun et al., 30 Nov 2025, Mamie et al., 7 Mar 2025)
Tool Integration API calls, UI ops, code exec, search (Li et al., 6 Aug 2025, Chen et al., 13 Oct 2025, Fang et al., 1 Aug 2025)
Reflection Self-, cross-, human-reflection (Liu et al., 2024, Fang et al., 1 Aug 2025, Sun et al., 30 Nov 2025)
Guardrails/Compliance Safety filters, RAI hooks, audit logs (Liu et al., 2024, Zhou et al., 2024)
Learning Adaptation Online, federated, self-improving (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025)

Architectural pattern catalogues and decision models have emerged to guide AFM design, facilitating pattern selection for goal intake, prompt engineering, planning, cooperation, safety, and learning. Patterns (e.g., voting-based cooperation, incremental querying, self-reflection) are prioritized by multi-criteria scoring to balance explainability, cost, accuracy, and accountability (Liu et al., 2024).

3. Agentic Workflows, Planning, and Reflection

AFMs operationalize sequential, adaptive workflows. For example, CPathAgent (Sun et al., 26 May 2025) treats high-resolution pathology diagnosis as an agent-problem. Its core loop is:

  • State: st=(xt,mt,ct,rt)s_t = (x_t, m_t, c_t, r_t) with visual view xtx_t, magnification mtm_t, coordinates ctc_t, and reasoning memory rtr_t.
  • Actions: {zoom_in,zoom_out,pan(dx,dy),terminate}\{zoom\_in, zoom\_out, pan(dx,dy), terminate\}.
  • Policy & Value: (πθ,Vϕ)(\pi_\theta, V_\phi) parameterized over concatenated visual-language tokens.
  • Training loss: Multi-scale supervised imitation spanning patch, region, and slide (e.g., LpatchL_{patch}, LregionL_{region}, LslideL_{slide}).

Chain-of-Agents (Li et al., 6 Aug 2025) and A2^2FM (Chen et al., 13 Oct 2025) generalize this to web/code/math problem-solving: the core loop activates “agents” or “tools,” each acting on persistent system state, with policies choosing which role or mode to execute at each step. Reflection (either as action or separate phase) allows iterative self-critique, error correction, and voting-based consensus (Fang et al., 1 Aug 2025, Liu et al., 2024).

Advanced AFMs support adaptive routing: deciding per-query whether to answer instantly, perform detailed reasoning, or invoke external tools, optimizing not only for accuracy but also inference cost (as in A2^2FM’s adaptive policy optimization) (Chen et al., 13 Oct 2025).

4. Multi-Agent, Swarm, and Collective AFMs

Recent work demonstrates that multi-agent architectures—where distinct roles or model instances interact, communicate, and coordinate—yield robust improvements in tasks requiring multi-step reasoning, planning, or real-world embodiment (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025, Sun et al., 30 Nov 2025). Key ingredients:

  • Belief and Theory-of-Mind Modules: Heads that encode distributions over peers’ goals/intents, essential for coordination and negotiation.
  • Hierarchical or Distributed Planning: Two-phase planning where agents propose local plans, negotiate a joint plan, and refine individually.
  • Communication Efficiency: Information bottleneck training compacts messages to maximize relevant throughput and minimize token cost.
  • Meta-Learning/Adaptation: Fast-weight adapters or meta-learners enable rapid adjustment to new partners/environments.

Empirically, native multi-agent intelligence does not emerge spontaneously with scale; specific modules and training (including population-based reinforcement learning and curriculum growth) are required for robust coordination, efficient messaging, and joint adaptation (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025). Swarm optimization (e.g., Society of HiveMind) demonstrates emergent gains in logical reasoning, but limited improvement in factual recall without explicit knowledge injection (Mamie et al., 7 Mar 2025).

5. Training Strategies, Supervision, and RL

AFMs are typically trained in multi-stage or multi-objective paradigms:

Training data curation is crucial. High-quality agentic traces, multi-agent conversation logs, and cross-domain mixes (web/file/code/reasoning) enable robustness and generalization. For multi-agent AFMs, population-based training, adversarial red-teaming, and curriculum growth are deployed to expose and reinforce emergent coordination and safety behaviors (Hu et al., 9 Dec 2025, Mamie et al., 7 Mar 2025).

6. Evaluation, Benchmarks, and Interpretability

AFM effectiveness is validated via diverse benchmarks:

Empirically, AFMs have achieved substantial gains in interpretability, task accuracy, efficiency, and cost over both static multi-agent systems and purely reasoning-centric LLMs. In mobile and cloud-edge hybrid scenarios (e.g., LightAgent), AFMs deliver competitive performance with significant reductions in cloud invocation and inference cost (Jiang et al., 24 Oct 2025).

7. Implications, Safety, and Future Directions

AFMs are redefining the landscape of agentic AI by unifying end-to-end trainability, multi-agent and multi-modal reasoning, and transparent, auditable execution pathways. The following directions are shaping ongoing research:

  • Hierarchical and Adaptive Routing: Learning to decide what reasoning mode or agentic protocol to invoke, improving efficiency without sacrificing coverage (Chen et al., 13 Oct 2025).
  • Life-Long Learning and Self-Improvement: Techniques for continual adaptation under agentic RL, meta-policy optimization, and memory-augmented learning are critical to generalization and robustness (Hu et al., 9 Dec 2025, Xiao et al., 20 Mar 2025).
  • Safety and Compliance: Proactive guardrails, adversarial population-based evaluation, and runtime monitors are required to mitigate collusion, norm violations, and error amplification in multi-agent AFMs (Hu et al., 9 Dec 2025, Zhou et al., 2024, Liu et al., 2024).
  • Architectural Pattern Guidance and Decision Models: Systematic, quantitative frameworks for architecture and pattern selection are supporting scalable and context-aware AFM deployment (Zhou et al., 2024, Liu et al., 2024).
  • Embodied and Real-World Deployment: Emerging multi-agent AFMs in robotics and networked environments demonstrate robust human-robot collaboration, dynamic delegation, and closed-loop control, outstripping monolithic FM approaches in real-world adaptability (Sun et al., 30 Nov 2025, Xiao et al., 20 Mar 2025).

In summary, Agent Foundation Models constitute a foundational AI paradigm—embedding perception, reasoning, tool invocation, and multi-agent coordination directly into a unified, trainable substrate, and delivering robust, interpretable, and efficient autonomy across diverse scientific and engineering domains (Sun et al., 26 May 2025, Hu et al., 9 Dec 2025, Li et al., 6 Aug 2025, Chen et al., 13 Oct 2025, Zhou et al., 2024, Liu et al., 2024, Sun et al., 30 Nov 2025, Mamie et al., 7 Mar 2025, Fang et al., 1 Aug 2025, Durante et al., 2024, Jiang et al., 24 Oct 2025, Xiao et al., 20 Mar 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agent Foundation Model (AFM).