Papers
Topics
Authors
Recent
2000 character limit reached

GEPA: Parallel Planning & Prompt Optimization

Updated 26 November 2025
  • GEPA is a dual-framework approach combining a parallel planning algorithm for robotics and an evolutionary prompt optimizer for LLM systems to boost efficiency.
  • GePA*SE leverages heterogeneous edge classification and multi-threaded processing to reduce planning times in complex motion and task planning domains.
  • The Genetic-Pareto optimizer employs reflection and Pareto-front selection to evolve and refine LLM prompts, enhancing performance in multi-hop QA and reasoning tasks.

GEPA refers to two distinct algorithmic frameworks: GePA*SE (Generalized Edge‐Based Parallel A*) for motion planning and the Genetic‐Pareto prompt optimizer for LLM systems. Both approaches extend state-of-the-art optimization techniques within their respective domains by exploiting parallelism or leveraging reflective, search-based prompt evolution.

1. GePA Terminology and Definitions

GePA*SE: Generalized Edge-Based Parallel A*

GePA*SE operates in graph-based planning domains where a directed graph G=(V,E)G = (V, E) encodes the state transitions and edge evaluations may have heterogeneous computational costs. The state space VSV \subseteq S and the edge set EV×A×VE \subseteq V \times A \times V (where AA is the action set) are coupled with a cost function c(e)0c(e) \ge 0 per edge and an admissible, consistent heuristic h:V[0,)h: V \rightarrow [0, \infty). The action space is partitioned into "cheap" (AcA^c) and "expensive" (AeA^e) actions, yielding corresponding edge subsets Ec,EeE^c, E^e such that E=EcEeE = E^c \cup E^e. The objective is to compute an optimal or bounded-suboptimal path π\pi from the start state s0s_0 to any gGg \in G (goal set), minimizing eπc(e)\sum_{e \in \pi} c(e) and leveraging NtN_t parallel threads to reduce wall-clock time (Mukherjee et al., 2023).

GEPA: Genetic-Pareto Prompt Optimizer

GEPA ("Genetic-Pareto," Editor's term) refers to a prompt optimization algorithm for AI systems utilizing LLMs. GEPA maintains a population of candidate system prompts, iteratively selecting a parent candidate Φk\Phi^k and a module for mutation. Rollouts are conducted to generate execution traces τ\tau and metric scores μ\mu; the system then performs natural language reflection (using the LLM) to propose prompt modifications. Candidate selection is governed by a Pareto frontier over a held-out set, ensuring diversity and exploration in the candidate pool (Agrawal et al., 25 Jul 2025).

2. GePA*SE Algorithmic Framework

GePA*SE generalizes the parallelization strategies of PA*SE (state‐level parallelism) and ePA*SE (edge‐level parallelism) to heterogeneous planning domains. The algorithm maintains two separate priority queues:

  • OPENeOPEN^e — for expensive edges, evaluated in parallel on worker threads.
  • OPENcOPEN^c — for "dummy" or cheap edges, piggybacked on state expansions in the master thread.

A set BB tracks busy states being processed for outgoing expensive edges. The main loop performs as follows:

  1. Initialize g(s)g(s) \gets \infty, n(s)0n(s) \gets 0 for all sVs \in V; set g(s0)=0g(s_0) = 0.
  2. Insert a dummy edge (s0,,s0)(s_0, \bot, s_0) into OPENcOPEN^c.
  3. At each iteration, select and expand the edge ee with minimum priority f(e)=g(e.s)+wh(e.s)f(e) = g(e.s) + w \cdot h(e.s), subject to an independence check for bounded-optimality:
  • Only expand ee if eOPEN\forall e' \in OPEN with f(e)<f(e): g(e.s)g(e.s)wh(e.s,e.s)f(e') < f(e): \ g(e.s) - g(e'.s) \le w' h(e'.s, e.s) and for all sBs' \in B with f(s)<f(e):g(e.s)g(s)wh(s,e.s)f(s') < f(e): g(e.s) - g(s') \le w'h(s',e.s).
  1. For eOPENce \in OPEN^c, perform cheap edge expansions directly.
  2. For eOPENee \in OPEN^e, assign evaluation to a parallel worker and update g(s)g(s') and parent references with appropriate locking upon completion.

If Ee=E^e = \emptyset, GePA*SE reduces to PA*SE; if Ec=E^c = \emptyset it matches ePA*SE. The ratio rc=Ee/Ecr^c = |E^e| / |E^c| controls this hybridization (Mukherjee et al., 2023).

3. GEPA Algorithmic Structure

GEPA adopts a population-based, evolutionary optimization paradigm for prompt design, leveraging reflection and Pareto-dominant candidate selection:

  • Population P\mathcal{P}: Set of active system instantiations.
  • Parent Selection: Pareto-based stochastic sampling favors candidates optimal for the largest subset of held-out instances.
  • Mutation/Reflection: A meta-prompt ingests traces τ\tau and feedback ff, generating an instruction update for a selected module. The updated candidate Φ\Phi' is admitted to P\mathcal{P} if it improves minibatch performance.
  • Pareto Frontier: For held-out instances ii, each candidate's best score sk(i)s_k(i) is tracked, and candidates not strictly dominated (i.e., sa(i)sb(i) is_a(i) \ge s_b(i)\ \forall i and sa(j)>sb(j)s_a(j) > s_b(j) for some jj) constitute the frontier.
  • Crossover (Merge): Merging complementary module-level edits from separate candidate lineages can further enhance outcomes in some settings (Agrawal et al., 25 Jul 2025).

Pseudocode and core formulas are provided for precise workflow specification in (Agrawal et al., 25 Jul 2025).

4. Performance and Experimental Evidence

GePA*SE in Planning

Empirical comparisons on both 2D gridworld and high-dimensional manipulation tasks demonstrate that W-GePA*SE consistently achieves lower planning times across thread counts and edge-cost regimes. Key results (mean planning time, sec; speedup vs. best baseline in parentheses):

Threads W-A* W-PA*SE W-ePA*SE W-GePA*SE Speedup
5 0.45 0.42 0.27 ↓36% vs best
10 0.31 0.28 0.17 ↓39%
50 0.19 0.18 0.17 ↓11%

In real-world robotic settings (PR2 block-assembly), W-GePA*SE achieved planning times 25–40% lower than the best baseline for low-to-medium thread counts, with substantial improvements persisting as thread count increases (Mukherjee et al., 2023).

GEPA for LLM Systems

GEPA displays superior sample efficiency and peak performance compared to both RL-based (GRPO) and Bayesian (MIPROv2) prompt optimizers. Representative results:

Model Baseline MIPROv2 GRPO GEPA GEPA Gain (pp)
Qwen3-8B 48.9% 55.1% 51.1% 61.3% +12.4
GPT-4.1mini 52.7% 59.7% 67.0% +14.3

To match GRPO's best validation score, GEPA required up to 73× fewer rollouts (e.g., HotpotQA: 402 vs 24,000 rollouts) (Agrawal et al., 25 Jul 2025).

5. Practical Implications and Domain Applications

GePA*SE

GePA*SE addresses heterogeneous action evaluation cost in robotics domains, including:

  • Kinodynamic planners combining analytic (cheap) and collision-checking (expensive) steps,
  • Manipulation problems that integrate static primitives with inverse kinematics or optimization,
  • Task planners interleaving symbolic and motion-planning computations.

Robust performance in such partitions is contingent on the ability to separate actions into cheap and expensive categories and the use of thread-safe concurrent data structures. A plausible implication is that GePA*SE's granularity-adaptive strategy generalizes efficiently across diverse robotics workflows, provided appropriate partitioning and heuristic design (Mukherjee et al., 2023).

GEPA

GEPA is suitable for modular LLM agents featuring multiple instruction-carrying prompts, tool integrations, and interpretable traces. Applications include:

  • Multi-hop QA (HotpotQA),
  • Retrieval-augmented claim verification (HoVer),
  • Tool-based reasoning pipelines,
  • Code optimization at inference time.

The framework yields particular advantage where rollout budgets are limited and task complexity necessitates dense, interpretable learning signals over crude reward-only feedback (Agrawal et al., 25 Jul 2025).

6. Limitations and Research Directions

GePA*SE

Limitations include reliance on effective action cost partitioning, sensitivity to locking overheads in high-concurrency scenarios, and the need for robust thread-safe priority queues. Prospective directions include adaptive partitioning strategies and minimizing data-structure contention for scalability (Mukherjee et al., 2023).

GEPA

GEPA currently focuses on prompt-level (instruction) edits, with no support for exemplar-level few-shot optimization. Pareto validation consumes most rollouts—a potential target for budget reduction via dynamic validation schedules. Further work may explore reflection prompt engineering, joint prompt + parameter optimization (e.g., RL seeding), and automated or scalable Pareto-set construction. This suggests possible synergy between GEPA and weight-space adaptation methods (Agrawal et al., 25 Jul 2025).

7. Summary and Outlook

"GEPA" designates two methodologically innovative approaches: GePA*SE unifies parallel planning paradigms for robotics with heterogeneous edge-evaluation costs, while the Genetic-Pareto optimizer leverages LLM-based reflection, evolutionary candidate search, and Pareto-front exploration for efficient prompt optimization. Both offer demonstrable performance gains, grounded in systematic empirical benchmarks, and illustrate the benefits of adapting the granularity of search or optimization steps to domain-specific cost structures and learning signals (Mukherjee et al., 2023, Agrawal et al., 25 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to GEPA.