LLM-Based Agent Frameworks

Updated 8 December 2025

LLM-based agent frameworks are modular systems that decompose agent cognition into perception, planning, action, and memory, enabling autonomous decision-making.
They integrate tool APIs and memory mechanisms to support self-reflection, dynamic task delegation, and both single-agent and multi-agent architectures.
Rigorous evaluation protocols using benchmarks like test pass rates and code smell reduction guide iterative improvements and practical deployments.

LLM–based agent frameworks constitute a technical foundation for constructing, orchestrating, and evaluating autonomous agents or multi-agent systems driven by the capabilities of advanced LLMs. These frameworks aim to provide robust abstractions, tool integrations, memory systems, planning components, and evaluation protocols that enable LLM-powered agents to perform complex, real-world tasks beyond simple prompting or one-shot language generation. The following sections present a systematic exposition of LLM-based agent frameworks, synthesizing current research on software engineering, scientific pipelines, decision support, empirical developer adoption, and unified modeling principles.

1. Foundational Concepts and Taxonomies

The design of LLM-based agent frameworks is grounded in a modular conception that decomposes agentic cognition into separable—but tightly interacting—components. A consensus emerges around formalizing the agent loop as a cycle of Perception → Planning → Action → Memory, enhanced by tool integrations and self-verification (Zeng et al., 6 Nov 2025, Mi et al., 6 Apr 2025, Zhao et al., 25 Aug 2025, Zhang et al., 2024, Hassouna et al., 2024). The dominant structures are:

Single-Agent, Tool-Based, and Multi-Agent Methods: As categorized in surveys, single-agent pipelines exploit LLM prompt engineering and self-reflection; tool-based models integrate external APIs or knowledge; multi-agent systems organize specialized roles and foster collaboration or competition (Zhao et al., 25 Aug 2025, Li, 2024).
Unified Modeling Constructs: Core constructs include the LLM "brain," memory buffers (ephemeral and persistent), planning modules (for goal decomposition), tool-calling APIs, and security/guardrail elements. The LLM-Agent-UMF further delineates a five-module core-agent: planning, memory, profile, action, and security (Hassouna et al., 2024).
Communication and Orchestration: Multi-agent frameworks support inter-agent messaging either through direct language channels, structured artifacts (JSON, API calls), or graph-based message-passing topologies (Yang et al., 2024, Zhang et al., 2024, Zhao et al., 25 Aug 2025).

2. Architectural Patterns and Workflow Variants

LLM-driven agent frameworks adopt a variety of workflow architectures, each targeting specific coordination, abstraction, and scalability requirements:

Design Paradigm	Core Features	Representative Examples
Monolithic Single-Agent	Unified LLM loop, no inter-agent comms	SDAgent-Single, BabyAGI
Role-Decomposed Pipelines	Sequential delegation (Developer, Tester, etc.)	SDAgent-DT, AgentMediation, RefAgent
Hierarchical/Manager–Worker	Active manager with passives; task routing	LLM-Agent-UMF (hybrid), MetaGPT
Ensemble/Debate Models	Sampling/voting or iterative debate	ChatEval, Agent Forest, CMD, MoA
DAG/Graph-Structured	Task/agent dependency graphs, flexible flows	LangGraph, LGC-MARL, AgentCoord
Concurrent Modular	Asynchronous modules, shared state	CMA (Concurrent Modular Agent)

Monolithic and pipelined models are prominent in end-to-end software development (SDAgent-Single, SDAgent-DT), whereas ensemble and graph-based coordination excels in settings demanding parallelism, dynamic task assignment, or robustness (Zeng et al., 6 Nov 2025, Oueslati et al., 5 Nov 2025, Aratchige et al., 13 Mar 2025, Jia et al., 13 Mar 2025, Wang et al., 20 Apr 2025).

3. Key Evaluation Protocols and Benchmarks

A hallmark of mature agent frameworks is their coupling with rigorous, replicable evaluation suites:

Hybrid Evaluation (Software Engineering): E2EDevBench, a dynamically curated benchmark of real-world PyPI projects, pairs automated test-case migration with requirement-driven LLM verification for granular measurement of requirement implementation, test pass-rate, and code coverage improvement (Zeng et al., 6 Nov 2025).
Domain-Specific Metrics: Frameworks targeting refactoring (RefAgent), mediation (AgentMediation), or decision support (LLM-driven explainable AI) employ metrics such as unit test pass-rate, code smell reduction rate, satisfaction/consensus rates, factor alignment, Nash equilibria computation, and LLM-judge rubrics (Oueslati et al., 5 Nov 2025, Chen et al., 8 Sep 2025, Pehlke et al., 10 Nov 2025).
Empirical Adoption Studies: Comparative analyses across ten open-source agent frameworks (LangChain, AutoGen, LangGraph, CrewAI, MetaGPT, LlamaIndex, Swarm, BabyAGI, Camel, Semantic Kernel) interrogate development efficiency, functional abstraction, learning cost, performance optimization, and maintainability based on large-scale developer discussions (Wang et al., 1 Dec 2025).

4. Representative Frameworks and Empirical Performance

Notable agent frameworks instantiate these principles with concrete engineering choices:

SWE-Agent Architectures: SDAgent-Single, SDAgent-DT (Developer–Tester), and SDAgent-DDT introduce controlled head-to-head benchmarking under E2EDevBench, revealing that separation of developer and tester roles (DT) improves requirement implementation (≈53.5%) over single-agent or waterfall decompositions, and that planning/failure points overwhelmingly limit performance, not code emission (Zeng et al., 6 Nov 2025).
RefAgent: A multi-agent LLM system for Java refactoring, adopting a pipeline of planning, execution, compilation, and testing agents with self-reflective feedback loops. Achieves state-of-the-art median test pass rates (90%), code smell reduction (52.5%), and ablation studies show that context retrieval and iterative validation are indispensable (Oueslati et al., 5 Nov 2025).
GoalAct: Enforces dynamic global planning and hierarchical skill-based execution, outperforming ReAct baselines by 12.22% on LegalAgentBench and validating the advantages of persistent global objectives and decomposable skills (Searching, Coding, Writing, Finish) (Chen et al., 23 Apr 2025).
AgentCoord and CMA: Emphasize the importance of structured intermediate representations and concurrent modular orchestration for debugging, transparency, and resilience (Pan et al., 2024, Maruyama et al., 26 Aug 2025).

5. Methodological Insights and Bottlenecks

Empirical and ablation studies identify recurrent challenges and suggest mitigation strategies:

Bottlenecks: Task planning errors (omission, misinterpretation) constitute the primary failure mode (>55%), surpassing execution and verification gaps. Failure to plan for coverage, poor requirement decomposition, and insufficient intra-agent self-checking dominate deficiencies, highlighting the need for strengthened comprehension and reflective planning (Zeng et al., 6 Nov 2025).
Workflow Structure: Pipelined Developer–Tester arrangements, deep abstraction layers, and modular role specialization improve implementation rates and maintainability; however, increased indirection may introduce cognitive overhead and debugging difficulty (Wang et al., 1 Dec 2025, Aratchige et al., 13 Mar 2025).
Communication Overhead: Multi-agent message passing is both a scaling bottleneck and a cost driver, motivating advances such as AgentPrune for systematic pruning of redundant communication edges (achieving up to 72.8% token reduction and improved adversarial robustness) (Zhang et al., 2024).
Evaluation Limits: Current benchmarks and evaluation metrics often capture only surface aspects; more nuanced, requirement-centric or process-based evaluation is requisite for measuring system-level, long-horizon competencies (Wang et al., 20 Apr 2025, Zhao et al., 25 Aug 2025).

6. Modeling Frameworks and Design Principles

Systematic modeling frameworks abstract common elements across disparate LLM agent systems:

LLM-Agent-UMF: Formalizes the agent system via a 5-tuple core-agent (planning, memory, profile, action, security), distinguishes passive and active agent types, models multi-core architectures (one-active-many-passive as optimal trade-off), and leverages the ATRAF methodology for rigorous attribute/trade-off evaluation (Hassouna et al., 2024).
Computer System Analogies: Von Neumann-inspired modular decomposition (Perception, Cognition, Memory, Tools, Action) and principled layering enforce abstraction, modularity, concurrency, and end-to-end robustness (Mi et al., 6 Apr 2025).
Ecosystemic Concepts: LaMAS (LLM–based Multi-Agent Systems) treats each agent as an independently owned entity with a capability vector, private/protected memory, explicit task decomposition, economic incentive mechanisms (Shapley-based credit), and robust privacy/security layers (Yang et al., 2024).
Reusable LMPRs: Distillation of agent profiled roles (policy, evaluator, dynamic model) enables the standardized assembly of tool use, planning/search, and feedback/reflection workflows (Li, 2024).

7. Implications for Practice and Research

The confluence of empirical findings, modeling abstractions, and technical advances yields critical guidance:

Developers should select frameworks not by popularity but ecosystem maturity, composability, and stability under versioning, combining role-specialized modules (e.g., AutoGen + LangChain + LlamaIndex) as required (Wang et al., 1 Dec 2025).
Architects are urged to decouple brain, memory, and planning modules, enforce composable APIs/interfaces, and provision for concurrency controls and native caching (Li et al., 5 Mar 2025, Wang et al., 1 Dec 2025).
Future Research must address the automated detection of infinite message loops, memory staleness, reflection-based planning and verification, integration of robust reward modeling for RL agents, and domain adaptation to settings requiring rigorous logical guarantees, such as legal AI (L4M) or multi-disciplinary physical design (Chen et al., 26 Nov 2025, Wang et al., 20 Apr 2025).

LLM-based agent frameworks now provide a modular, empirically validated, and theory-driven basis for constructing scalable, interpretable, and domain-extensible autonomous systems. Their evolution is shaped by interdisciplinary adoption, developer experience, and the continued fusion of symbolic, sub-symbolic, and socio-technical perspectives (Wang et al., 1 Dec 2025, Zhao et al., 25 Aug 2025).

Markdown Upgrade to Chat

References (19)

Benchmarking and Studying the LLM-based Agent System in End-to-End Software Development (2025)

Building LLM Agents by Incorporating Insights from Computer Systems (2025)

LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios (2025)

Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems (2024)

LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Design of Multi Active/Passive Core-Agent Architectures (2024)

A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning (2024)

LLM-based Multi-Agent Systems: Techniques and Business Perspectives (2024)

RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring (2025)

LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems (2025)

10.

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy (2025)

11.

An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework (2025)

12.

Simulating Dispute Mediation with LLM-Based Agents for Legal Research (2025)

13.

LLM Driven Processes to Foster Explainable AI (2025)

14.

An Empirical Study of Agent Developer Practices in AI Agent Frameworks (2025)

15.

Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution (2025)

16.

AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration (2024)

17.

A Concurrent Modular Agent: Framework for Autonomous LLM Agents (2025)

18.

Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems (2025)

19.

Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLM-Based Agent Frameworks.