Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EvoAgentX: Evolving Multi-Agent Workflows

Updated 8 July 2025
  • EvoAgentX is an open-source framework that evolves multi-agent workflows through modular design and integrated evolutionary optimization algorithms.
  • It features a layered architecture that separates basic functions, agent management, workflow execution, evolution, and evaluation for systematic improvements.
  • Its optimization modules, including TextGrad, AFlow, and MIPRO, drive measurable performance gains in reasoning, code generation, and collaborative real-world tasks.

EvoAgentX is an open-source automated framework for evolving multi-agent workflows, designed to optimize the orchestration of LLMs and specialized tools in collaborative, complex task settings. Distinguished by its modular architecture and integrated evolutionary optimization algorithms, EvoAgentX facilitates dynamic, iterative refinement of agent prompts, workflow topology, and agent configurations. The system achieves consistent performance gains across reasoning, code generation, mathematical problem-solving, and real-world multi-agent applications (2507.03616).

1. Architectural Design

EvoAgentX adopts a layered modular architecture comprising five distinct layers, each fulfilling a specialized function to realize automatic workflow generation, execution, and evolutionary improvement:

  • Basic Components Layer: Manages foundational functions such as configuration, logging, file I/O, and storage, ensuring extensibility and integration with diverse LLM providers.
  • Agent Layer: Encapsulates individual agents, each defined as a tuple ai=LLMi,Memi,{Acti(j)}j=1Ma_i = \langle LLM_i, Mem_i, \{Act_i^{(j)}\}_{j=1}^M \rangle, where LLMiLLM_i is the LLM, MemiMem_i the memory module, and Acti(j)Act_i^{(j)} represents the agent’s executable actions.
  • Workflow Layer: Represents tasks and inter-agent coordination as a directed graph W=(V,E)\mathcal{W} = (\mathcal{V}, \mathcal{E}). Each node (WorkFlowNode) tracks status (PENDING, RUNNING, COMPLETED, FAILED), input/output specifications, and assigned agents, supporting both simple and highly complex dependency structures.
  • Evolving Layer: Implements automated optimization algorithms to refine agent prompts, configurations, and workflow structure. Three optimization modules—Agent Optimizer, Workflow Optimizer, and Memory Optimizer (the latter under development)—operate synergistically to enable self-improving MAS workflows.
  • Evaluation Layer: Provides both quantitative and qualitative assessment tools. Evaluators range from metric-based (e.g., F1 score, pass@1, solve accuracy) to LLM-based subjective assessments, underpinning robust measurement and feedback for further optimization.

This architecture promotes extensibility, decoupled component development, and systematic evolution of workflows with minimal manual intervention.

2. Integrated Evolutionary Optimization Algorithms

EvoAgentX’s evolving layer incorporates three distinct MAS optimization algorithms, each targeting a key aspect of the agentic workflow:

  • TextGrad: Refines agent prompt templates using gradient-based prompt tuning and preference-guided updates, systematically enhancing agent communication and reasoning capacity within the workflow.
  • AFlow: Specializes in workflow topology optimization. By dynamically reordering workflow nodes and adjusting dependency edges in W\mathcal{W}, AFlow improves inter-agent coordination and overall task execution flow.
  • MIPRO: Focuses on iterative optimization of agent configurations, including prompt-level settings and tool integration parameters, aligning agents’ behavior more closely with task objectives and environmental demands.

These algorithms enable EvoAgentX to automate the iterative search for more effective MAS configurations, integrating performance feedback from the evaluation layer to guide the optimization process.

3. Workflow Representation and Execution

Workflows in EvoAgentX are formalized as directed graphs, supporting high expressiveness for complex task decomposition and dependency management. Each workflow node specifies its associated agents, expected input/output schemas, and runtime status, allowing for both custom and reusable workflow templates.

The system supports automatic conversion of user-defined or externally-sourced task specifications into executable multi-agent workflows. Status propagation and node execution are managed by the workflow layer, with dynamic updates enabled by the evolving layer’s optimization routines.

Status Table: Workflow Node Representation

Field Description Example Value
Status Node execution state RUNNING, COMPLETED
Agent(s) Assigned intelligent entities Agent-1, Agent-2
I/O Spec Input and output requirements {"in": "text", ...}

4. Benchmarking and Empirical Evaluation

EvoAgentX has been evaluated across several challenging benchmarks:

  • HotPotQA (Multi-hop Reasoning): Achieves a 7.44% F1 score improvement over baseline configurations, demonstrating enhanced multi-step reasoning capacity.
  • MBPP (Code Generation): Pass@1 metric improved by 10.00%, indicating significant gains in generating correct Python code from natural language descriptions.
  • MATH (Mathematical Problem Solving): Solve accuracy increased by 10.00%, reflecting more effective decomposition and solution of symbolic mathematical problems.
  • GAIA (Real-world Multi-Agent Tasks): Delivers up to 20.00% overall accuracy improvement across complex, real-world collaborative scenarios.

Detailed evaluation on real-world systems such as Open Deep Research and OWL revealed substantial accuracy increases at multiple levels (e.g., 18.41% overall gain and up to 100.00% at specific evaluation stages).

5. Real-World Applications and Tool Integration

EvoAgentX’s evolutionary optimization methodologies have been applied to multi-agent systems in practical settings, including scientific research platforms (Open Deep Research, OWL) and collaborative AI environments (GAIA). The system’s ability to evolve both agent-level strategies and workflow structures has produced measurable improvements in collaborative performance and task outcome quality.

A key aspect is the framework’s capacity for plug-and-play integration with a variety of LLMs and external tool chains via its basic components and agent layers. This modularity enables researchers and practitioners to adapt EvoAgentX to diverse application domains without extensive reconfiguration.

6. Open Source Accessibility and Extensibility

EvoAgentX is distributed as open-source software at https://github.com/EvoAgentX/EvoAgentX (2507.03616). The repository provides the full platform implementation, ready-to-use optimization modules, and documentation for custom workflow extension. This accessibility supports reproducible research and accelerates adaptation in new domains.

Planned and ongoing extensions include the development of the Memory Optimizer for advanced experience retention, support for richer tool integration, and explorations in continual agent memory and dynamic team restructuring.

EvoAgentX builds upon foundational advances in agent-based evolutionary computation, neuroevolution, and multi-agent workflow optimization. Its modular architecture is aligned with the trend of integrating powerful LLMs and evolutionary principles for automated agent orchestration (2502.05907, 2406.14228). The three-tiered evolving layer merges prompt engineering, workflow reconfiguration, and agent adaptation into a cohesive, iterative optimization strategy, distinguishing EvoAgentX from previous MAS frameworks that often lack dynamic evolution or require significant manual intervention.

EvoAgentX’s demonstrated performance improvements, extensible architecture, and open access collectively mark it as a significant advancement in the automation and optimization of multi-agent workflows for complex reasoning and real-world coordination tasks.