Boundary between prompt-based and weight-based learning for compound AI systems

Characterize the boundary between reflective prompt evolution (GEPA) and weight-space reinforcement learning methods (such as Group Relative Policy Optimization with LoRA or full-parameter finetuning) for optimizing compound AI systems, by determining the data and rollout regimes under which prompt-based optimization versus weight updates are expected to outperform one another.

Background

The paper introduces GEPA, a reflective prompt evolution algorithm that optimizes multi-module LLM systems using natural language feedback and Pareto-based candidate selection. Across multiple tasks, GEPA is shown to outperform GRPO (a reinforcement learning algorithm) with far fewer rollouts, highlighting substantial sample-efficiency advantages for prompt-based optimization.

Despite these results, the authors note that when training data or rollout budgets are abundant, standard weight-space reinforcement learning may surpass prompt-based methods. They explicitly state that the boundary between these learning paradigms is not well understood, motivating a precise characterization of when each approach should be preferred for compound AI systems.

References

The boundary between prompt-based and weight-based learning is not well understood—although GEPA excels when rollouts are expensive, it is likely that weight updates will outperform prompting in regimes with abundant data or when large-scale rollouts are feasible.

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2507.19457 - Agrawal et al., 25 Jul 2025) in Limitations and Future Work