Papers
Topics
Authors
Recent
2000 character limit reached

LLM-Based Agent Frameworks

Updated 8 December 2025
  • LLM-based agent frameworks are modular systems that decompose agent cognition into perception, planning, action, and memory, enabling autonomous decision-making.
  • They integrate tool APIs and memory mechanisms to support self-reflection, dynamic task delegation, and both single-agent and multi-agent architectures.
  • Rigorous evaluation protocols using benchmarks like test pass rates and code smell reduction guide iterative improvements and practical deployments.

LLM–based agent frameworks constitute a technical foundation for constructing, orchestrating, and evaluating autonomous agents or multi-agent systems driven by the capabilities of advanced LLMs. These frameworks aim to provide robust abstractions, tool integrations, memory systems, planning components, and evaluation protocols that enable LLM-powered agents to perform complex, real-world tasks beyond simple prompting or one-shot language generation. The following sections present a systematic exposition of LLM-based agent frameworks, synthesizing current research on software engineering, scientific pipelines, decision support, empirical developer adoption, and unified modeling principles.

1. Foundational Concepts and Taxonomies

The design of LLM-based agent frameworks is grounded in a modular conception that decomposes agentic cognition into separable—but tightly interacting—components. A consensus emerges around formalizing the agent loop as a cycle of Perception → Planning → Action → Memory, enhanced by tool integrations and self-verification (Zeng et al., 6 Nov 2025, Mi et al., 6 Apr 2025, Zhao et al., 25 Aug 2025, Zhang et al., 3 Oct 2024, Hassouna et al., 17 Sep 2024). The dominant structures are:

  • Single-Agent, Tool-Based, and Multi-Agent Methods: As categorized in surveys, single-agent pipelines exploit LLM prompt engineering and self-reflection; tool-based models integrate external APIs or knowledge; multi-agent systems organize specialized roles and foster collaboration or competition (Zhao et al., 25 Aug 2025, Li, 9 Jun 2024).
  • Unified Modeling Constructs: Core constructs include the LLM "brain," memory buffers (ephemeral and persistent), planning modules (for goal decomposition), tool-calling APIs, and security/guardrail elements. The LLM-Agent-UMF further delineates a five-module core-agent: planning, memory, profile, action, and security (Hassouna et al., 17 Sep 2024).
  • Communication and Orchestration: Multi-agent frameworks support inter-agent messaging either through direct language channels, structured artifacts (JSON, API calls), or graph-based message-passing topologies (Yang et al., 21 Nov 2024, Zhang et al., 3 Oct 2024, Zhao et al., 25 Aug 2025).

2. Architectural Patterns and Workflow Variants

LLM-driven agent frameworks adopt a variety of workflow architectures, each targeting specific coordination, abstraction, and scalability requirements:

Design Paradigm Core Features Representative Examples
Monolithic Single-Agent Unified LLM loop, no inter-agent comms SDAgent-Single, BabyAGI
Role-Decomposed Pipelines Sequential delegation (Developer, Tester, etc.) SDAgent-DT, AgentMediation, RefAgent
Hierarchical/Manager–Worker Active manager with passives; task routing LLM-Agent-UMF (hybrid), MetaGPT
Ensemble/Debate Models Sampling/voting or iterative debate ChatEval, Agent Forest, CMD, MoA
DAG/Graph-Structured Task/agent dependency graphs, flexible flows LangGraph, LGC-MARL, AgentCoord
Concurrent Modular Asynchronous modules, shared state CMA (Concurrent Modular Agent)

Monolithic and pipelined models are prominent in end-to-end software development (SDAgent-Single, SDAgent-DT), whereas ensemble and graph-based coordination excels in settings demanding parallelism, dynamic task assignment, or robustness (Zeng et al., 6 Nov 2025, Oueslati et al., 5 Nov 2025, Aratchige et al., 13 Mar 2025, Jia et al., 13 Mar 2025, Wang et al., 20 Apr 2025).

3. Key Evaluation Protocols and Benchmarks

A hallmark of mature agent frameworks is their coupling with rigorous, replicable evaluation suites:

  • Hybrid Evaluation (Software Engineering): E2EDevBench, a dynamically curated benchmark of real-world PyPI projects, pairs automated test-case migration with requirement-driven LLM verification for granular measurement of requirement implementation, test pass-rate, and code coverage improvement (Zeng et al., 6 Nov 2025).
  • Domain-Specific Metrics: Frameworks targeting refactoring (RefAgent), mediation (AgentMediation), or decision support (LLM-driven explainable AI) employ metrics such as unit test pass-rate, code smell reduction rate, satisfaction/consensus rates, factor alignment, Nash equilibria computation, and LLM-judge rubrics (Oueslati et al., 5 Nov 2025, Chen et al., 8 Sep 2025, Pehlke et al., 10 Nov 2025).
  • Empirical Adoption Studies: Comparative analyses across ten open-source agent frameworks (LangChain, AutoGen, LangGraph, CrewAI, MetaGPT, LlamaIndex, Swarm, BabyAGI, Camel, Semantic Kernel) interrogate development efficiency, functional abstraction, learning cost, performance optimization, and maintainability based on large-scale developer discussions (Wang et al., 1 Dec 2025).

4. Representative Frameworks and Empirical Performance

Notable agent frameworks instantiate these principles with concrete engineering choices:

  • SWE-Agent Architectures: SDAgent-Single, SDAgent-DT (Developer–Tester), and SDAgent-DDT introduce controlled head-to-head benchmarking under E2EDevBench, revealing that separation of developer and tester roles (DT) improves requirement implementation (≈53.5%) over single-agent or waterfall decompositions, and that planning/failure points overwhelmingly limit performance, not code emission (Zeng et al., 6 Nov 2025).
  • RefAgent: A multi-agent LLM system for Java refactoring, adopting a pipeline of planning, execution, compilation, and testing agents with self-reflective feedback loops. Achieves state-of-the-art median test pass rates (90%), code smell reduction (52.5%), and ablation studies show that context retrieval and iterative validation are indispensable (Oueslati et al., 5 Nov 2025).
  • GoalAct: Enforces dynamic global planning and hierarchical skill-based execution, outperforming ReAct baselines by 12.22% on LegalAgentBench and validating the advantages of persistent global objectives and decomposable skills (Searching, Coding, Writing, Finish) (Chen et al., 23 Apr 2025).
  • AgentCoord and CMA: Emphasize the importance of structured intermediate representations and concurrent modular orchestration for debugging, transparency, and resilience (Pan et al., 18 Apr 2024, Maruyama et al., 26 Aug 2025).

5. Methodological Insights and Bottlenecks

Empirical and ablation studies identify recurrent challenges and suggest mitigation strategies:

  • Bottlenecks: Task planning errors (omission, misinterpretation) constitute the primary failure mode (>55%), surpassing execution and verification gaps. Failure to plan for coverage, poor requirement decomposition, and insufficient intra-agent self-checking dominate deficiencies, highlighting the need for strengthened comprehension and reflective planning (Zeng et al., 6 Nov 2025).
  • Workflow Structure: Pipelined Developer–Tester arrangements, deep abstraction layers, and modular role specialization improve implementation rates and maintainability; however, increased indirection may introduce cognitive overhead and debugging difficulty (Wang et al., 1 Dec 2025, Aratchige et al., 13 Mar 2025).
  • Communication Overhead: Multi-agent message passing is both a scaling bottleneck and a cost driver, motivating advances such as AgentPrune for systematic pruning of redundant communication edges (achieving up to 72.8% token reduction and improved adversarial robustness) (Zhang et al., 3 Oct 2024).
  • Evaluation Limits: Current benchmarks and evaluation metrics often capture only surface aspects; more nuanced, requirement-centric or process-based evaluation is requisite for measuring system-level, long-horizon competencies (Wang et al., 20 Apr 2025, Zhao et al., 25 Aug 2025).

6. Modeling Frameworks and Design Principles

Systematic modeling frameworks abstract common elements across disparate LLM agent systems:

  • LLM-Agent-UMF: Formalizes the agent system via a 5-tuple core-agent (planning, memory, profile, action, security), distinguishes passive and active agent types, models multi-core architectures (one-active-many-passive as optimal trade-off), and leverages the ATRAF methodology for rigorous attribute/trade-off evaluation (Hassouna et al., 17 Sep 2024).
  • Computer System Analogies: Von Neumann-inspired modular decomposition (Perception, Cognition, Memory, Tools, Action) and principled layering enforce abstraction, modularity, concurrency, and end-to-end robustness (Mi et al., 6 Apr 2025).
  • Ecosystemic Concepts: LaMAS (LLM–based Multi-Agent Systems) treats each agent as an independently owned entity with a capability vector, private/protected memory, explicit task decomposition, economic incentive mechanisms (Shapley-based credit), and robust privacy/security layers (Yang et al., 21 Nov 2024).
  • Reusable LMPRs: Distillation of agent profiled roles (policy, evaluator, dynamic model) enables the standardized assembly of tool use, planning/search, and feedback/reflection workflows (Li, 9 Jun 2024).

7. Implications for Practice and Research

The confluence of empirical findings, modeling abstractions, and technical advances yields critical guidance:

  • Developers should select frameworks not by popularity but ecosystem maturity, composability, and stability under versioning, combining role-specialized modules (e.g., AutoGen + LangChain + LlamaIndex) as required (Wang et al., 1 Dec 2025).
  • Architects are urged to decouple brain, memory, and planning modules, enforce composable APIs/interfaces, and provision for concurrency controls and native caching (Li et al., 5 Mar 2025, Wang et al., 1 Dec 2025).
  • Future Research must address the automated detection of infinite message loops, memory staleness, reflection-based planning and verification, integration of robust reward modeling for RL agents, and domain adaptation to settings requiring rigorous logical guarantees, such as legal AI (L4M) or multi-disciplinary physical design (Chen et al., 26 Nov 2025, Wang et al., 20 Apr 2025).

LLM-based agent frameworks now provide a modular, empirically validated, and theory-driven basis for constructing scalable, interpretable, and domain-extensible autonomous systems. Their evolution is shaped by interdisciplinary adoption, developer experience, and the continued fusion of symbolic, sub-symbolic, and socio-technical perspectives (Wang et al., 1 Dec 2025, Zhao et al., 25 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to LLM-Based Agent Frameworks.