Papers
Topics
Authors
Recent
Search
2000 character limit reached

GEPA: Parallel Planning & Prompt Optimization

Updated 26 November 2025
  • GEPA is a dual-framework approach combining a parallel planning algorithm for robotics and an evolutionary prompt optimizer for LLM systems to boost efficiency.
  • GePA*SE leverages heterogeneous edge classification and multi-threaded processing to reduce planning times in complex motion and task planning domains.
  • The Genetic-Pareto optimizer employs reflection and Pareto-front selection to evolve and refine LLM prompts, enhancing performance in multi-hop QA and reasoning tasks.

GEPA refers to two distinct algorithmic frameworks: GePA*SE (Generalized Edge‐Based Parallel A*) for motion planning and the Genetic‐Pareto prompt optimizer for LLM systems. Both approaches extend state-of-the-art optimization techniques within their respective domains by exploiting parallelism or leveraging reflective, search-based prompt evolution.

1. GePA Terminology and Definitions

GePA*SE: Generalized Edge-Based Parallel A*

GePA*SE operates in graph-based planning domains where a directed graph G=(V,E)G = (V, E) encodes the state transitions and edge evaluations may have heterogeneous computational costs. The state space VSV \subseteq S and the edge set EV×A×VE \subseteq V \times A \times V (where AA is the action set) are coupled with a cost function c(e)0c(e) \ge 0 per edge and an admissible, consistent heuristic h:V[0,)h: V \rightarrow [0, \infty). The action space is partitioned into "cheap" (AcA^c) and "expensive" (AeA^e) actions, yielding corresponding edge subsets Ec,EeE^c, E^e such that E=EcEeE = E^c \cup E^e. The objective is to compute an optimal or bounded-suboptimal path VSV \subseteq S0 from the start state VSV \subseteq S1 to any VSV \subseteq S2 (goal set), minimizing VSV \subseteq S3 and leveraging VSV \subseteq S4 parallel threads to reduce wall-clock time (Mukherjee et al., 2023).

GEPA: Genetic-Pareto Prompt Optimizer

GEPA ("Genetic-Pareto," Editor's term) refers to a prompt optimization algorithm for AI systems utilizing LLMs. GEPA maintains a population of candidate system prompts, iteratively selecting a parent candidate VSV \subseteq S5 and a module for mutation. Rollouts are conducted to generate execution traces VSV \subseteq S6 and metric scores VSV \subseteq S7; the system then performs natural language reflection (using the LLM) to propose prompt modifications. Candidate selection is governed by a Pareto frontier over a held-out set, ensuring diversity and exploration in the candidate pool (Agrawal et al., 25 Jul 2025).

2. GePA*SE Algorithmic Framework

GePA*SE generalizes the parallelization strategies of PA*SE (state‐level parallelism) and ePA*SE (edge‐level parallelism) to heterogeneous planning domains. The algorithm maintains two separate priority queues:

  • VSV \subseteq S8 — for expensive edges, evaluated in parallel on worker threads.
  • VSV \subseteq S9 — for "dummy" or cheap edges, piggybacked on state expansions in the master thread.

A set EV×A×VE \subseteq V \times A \times V0 tracks busy states being processed for outgoing expensive edges. The main loop performs as follows:

  1. Initialize EV×A×VE \subseteq V \times A \times V1, EV×A×VE \subseteq V \times A \times V2 for all EV×A×VE \subseteq V \times A \times V3; set EV×A×VE \subseteq V \times A \times V4.
  2. Insert a dummy edge EV×A×VE \subseteq V \times A \times V5 into EV×A×VE \subseteq V \times A \times V6.
  3. At each iteration, select and expand the edge EV×A×VE \subseteq V \times A \times V7 with minimum priority EV×A×VE \subseteq V \times A \times V8, subject to an independence check for bounded-optimality:
  • Only expand EV×A×VE \subseteq V \times A \times V9 if AA0 with AA1 and for all AA2 with AA3.
  1. For AA4, perform cheap edge expansions directly.
  2. For AA5, assign evaluation to a parallel worker and update AA6 and parent references with appropriate locking upon completion.

If AA7, GePA*SE reduces to PA*SE; if AA8 it matches ePA*SE. The ratio AA9 controls this hybridization (Mukherjee et al., 2023).

3. GEPA Algorithmic Structure

GEPA adopts a population-based, evolutionary optimization paradigm for prompt design, leveraging reflection and Pareto-dominant candidate selection:

  • Population c(e)0c(e) \ge 00: Set of active system instantiations.
  • Parent Selection: Pareto-based stochastic sampling favors candidates optimal for the largest subset of held-out instances.
  • Mutation/Reflection: A meta-prompt ingests traces c(e)0c(e) \ge 01 and feedback c(e)0c(e) \ge 02, generating an instruction update for a selected module. The updated candidate c(e)0c(e) \ge 03 is admitted to c(e)0c(e) \ge 04 if it improves minibatch performance.
  • Pareto Frontier: For held-out instances c(e)0c(e) \ge 05, each candidate's best score c(e)0c(e) \ge 06 is tracked, and candidates not strictly dominated (i.e., c(e)0c(e) \ge 07 and c(e)0c(e) \ge 08 for some c(e)0c(e) \ge 09) constitute the frontier.
  • Crossover (Merge): Merging complementary module-level edits from separate candidate lineages can further enhance outcomes in some settings (Agrawal et al., 25 Jul 2025).

Pseudocode and core formulas are provided for precise workflow specification in (Agrawal et al., 25 Jul 2025).

4. Performance and Experimental Evidence

GePA*SE in Planning

Empirical comparisons on both 2D gridworld and high-dimensional manipulation tasks demonstrate that W-GePA*SE consistently achieves lower planning times across thread counts and edge-cost regimes. Key results (mean planning time, sec; speedup vs. best baseline in parentheses):

Threads W-A* W-PA*SE W-ePA*SE W-GePA*SE Speedup
5 0.45 0.42 0.27 ↓36% vs best
10 0.31 0.28 0.17 ↓39%
50 0.19 0.18 0.17 ↓11%

In real-world robotic settings (PR2 block-assembly), W-GePA*SE achieved planning times 25–40% lower than the best baseline for low-to-medium thread counts, with substantial improvements persisting as thread count increases (Mukherjee et al., 2023).

GEPA for LLM Systems

GEPA displays superior sample efficiency and peak performance compared to both RL-based (GRPO) and Bayesian (MIPROv2) prompt optimizers. Representative results:

Model Baseline MIPROv2 GRPO GEPA GEPA Gain (pp)
Qwen3-8B 48.9% 55.1% 51.1% 61.3% +12.4
GPT-4.1mini 52.7% 59.7% 67.0% +14.3

To match GRPO's best validation score, GEPA required up to 73× fewer rollouts (e.g., HotpotQA: 402 vs 24,000 rollouts) (Agrawal et al., 25 Jul 2025).

5. Practical Implications and Domain Applications

GePA*SE

GePA*SE addresses heterogeneous action evaluation cost in robotics domains, including:

  • Kinodynamic planners combining analytic (cheap) and collision-checking (expensive) steps,
  • Manipulation problems that integrate static primitives with inverse kinematics or optimization,
  • Task planners interleaving symbolic and motion-planning computations.

Robust performance in such partitions is contingent on the ability to separate actions into cheap and expensive categories and the use of thread-safe concurrent data structures. A plausible implication is that GePA*SE's granularity-adaptive strategy generalizes efficiently across diverse robotics workflows, provided appropriate partitioning and heuristic design (Mukherjee et al., 2023).

GEPA

GEPA is suitable for modular LLM agents featuring multiple instruction-carrying prompts, tool integrations, and interpretable traces. Applications include:

  • Multi-hop QA (HotpotQA),
  • Retrieval-augmented claim verification (HoVer),
  • Tool-based reasoning pipelines,
  • Code optimization at inference time.

The framework yields particular advantage where rollout budgets are limited and task complexity necessitates dense, interpretable learning signals over crude reward-only feedback (Agrawal et al., 25 Jul 2025).

6. Limitations and Research Directions

GePA*SE

Limitations include reliance on effective action cost partitioning, sensitivity to locking overheads in high-concurrency scenarios, and the need for robust thread-safe priority queues. Prospective directions include adaptive partitioning strategies and minimizing data-structure contention for scalability (Mukherjee et al., 2023).

GEPA

GEPA currently focuses on prompt-level (instruction) edits, with no support for exemplar-level few-shot optimization. Pareto validation consumes most rollouts—a potential target for budget reduction via dynamic validation schedules. Further work may explore reflection prompt engineering, joint prompt + parameter optimization (e.g., RL seeding), and automated or scalable Pareto-set construction. This suggests possible synergy between GEPA and weight-space adaptation methods (Agrawal et al., 25 Jul 2025).

7. Summary and Outlook

"GEPA" designates two methodologically innovative approaches: GePA*SE unifies parallel planning paradigms for robotics with heterogeneous edge-evaluation costs, while the Genetic-Pareto optimizer leverages LLM-based reflection, evolutionary candidate search, and Pareto-front exploration for efficient prompt optimization. Both offer demonstrable performance gains, grounded in systematic empirical benchmarks, and illustrate the benefits of adapting the granularity of search or optimization steps to domain-specific cost structures and learning signals (Mukherjee et al., 2023, Agrawal et al., 25 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GEPA.