PromptEvolver: Co-evolving Prompts & Heuristics
- PromptEvolver Architecture is a closed-loop system co-evolving LLM prompts and heuristic algorithms to automatically design and refine combinatorial optimization heuristics.
- It integrates dual evolutionary processes with reflective feedback, island migration, and adaptive mutation strategies to overcome local optima.
- Empirical results show significant reductions in errors for TSP and BPP benchmarks, validating the architecture’s efficiency and robustness.
A PromptEvolver Architecture, as formalized in (Liu et al., 29 Sep 2025), denotes a closed-loop evolutionary system in which both LLM prompts and their induced heuristic algorithms are subject to explicit, experience-guided co-evolution. This architecture is engineered for the automatic design of heuristics in combinatorial optimization problems, and its operational cycle leverages reflective feedback from heuristic execution, diversity maintenance through population structuring, and continual adaptation of the prompt space to overcome local optima.
1. Dual Evolutionary Framework: System Overview
The PromptEvolver Architecture (hereafter "EvoPH") is structured around two tightly coupled evolutionary processes:
- Heuristics Evolution: Maintains a population of candidate heuristic algorithms, evolving each via mutation operators implemented through LLMs.
- Prompt Evolution: Simultaneously mutates and refines the LLM prompts that guide heuristic generation, embedding dynamic mutation strategies and receiving performance-derived feedback.
The full system proceeds in iterative cycles, each comprising initialization, evolutionary steps (selection, mutation, evaluation), feedback aggregation, prompt refinement, population update (including diversity management), and elite migration across subpopulations (“islands”). The workflow is characterized by a closed feedback loop in which execution and performance inform both heuristic and prompt adaptation.
2. Co-evolution of Prompts and Heuristics
Central to the EvoPH methodology is the meta-evolutionary interplay:
- Prompt Update: After each generation, execution feedback (including performance metrics and error summaries) is distilled and used to revise LLM prompts. Successful prompts are preferentially reinforced, while failure-prone or stagnating prompts are modified or abandoned.
- Strategy Sampling: Prompts contain explicit mutation strategy selections (e.g., parameter tuning, structural modification, rewriting), which are adaptively sampled based on accumulated execution experience. This mechanism enables the system to navigate both local search (refinement) and global search (structural innovation).
- Heuristics Generation: Each new heuristic is synthesized by passing the current prompt—encoding both problem specifications and mutation strategy—to the LLM, resulting in code-level (algorithmic) offspring evaluated for correctness and quality.
- Meta-evolution: Prompts themselves evolve, with their structure and content directly shaped by empirical outcomes, ensuring dynamic responsiveness to observed challenges and solution space topology.
3. Population Structuring: Island Migration and Elite Selection
The population of heuristics is partitioned into multiple independent subpopulations ("islands"), each maintaining its own archive of elites. This mechanism is formalized as follows:
- Elite Archive (): Per-island feature-space indexed map, storing the current best heuristic for each behavioral descriptor cell.
- Feature Mapping (): Defines the axes of behavioral diversity (e.g., error type, functional characteristics), projecting each candidate to a unique archive position.
- Archive Update Rule:
where is a performance metric (e.g., relative error).
- Migration: At scheduled intervals, top-performing elites are transferred between islands, promoting global diversity and accelerating escape from local optima.
- Selection Balance: Experience-based signals inform whether selection should emphasize exploitation (choosing consistent winners) or exploration (sampling for diversity).
This approach is analogous to structured evolutionary algorithms in computational biology and population-based optimization, enforcing both depth and breadth in solution discovery.
4. Experience-Guided Reflective Feedback
Post-execution, each heuristic candidate is analyzed for:
- Success or Failure: Binary attribution based on correctness and runtime outcome.
- Performance Metrics: Quantitative assessment (e.g., solution cost, error rate) collected for every successfully executed candidate.
- Error Taxonomy: Systematic listing of errors (syntax, logic, runtime), aggregated to inform future mutation strategies.
- Experience Distillation: Aggregation of outcome data across the generation, feeding a summarization module that informs future prompt construction and mutation.
Feedback is then leveraged for:
- Focusing mutation strategies on correcting recurrent failure modalities.
- Fine-tuning prompt templates to penalize stagnation and reward high-impact mutations.
- Adapting mutation frequency, scope, and strategy allocation per prompt, leading to hierarchical prompt structuring (e.g., partitioning error repair from performance optimization).
A core tenet is the continual, incremental learning from empirical execution—prompt evolution is not pre-scripted, but emergently shaped by the solution landscape.
5. Mathematical Formalism, Algorithmic Workflow, and Implementation
Heuristic Search Objective:
where is a heuristic program, is the heuristic search space, and is a performance metric (e.g., solution quality on TSP or BPP).
Sample TSP Objective:
Relative Error:
Algorithmic Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 |
for generation in range(max_generations): for each island: select parent(s) using experience-guided strategy sample mutation strategy (parameter, structure, heuristic rewrite, etc.) construct prompt using experience, mutation strategy, and updated instructions generate child heuristic using LLM execute child, record success/error and performance summarize experience (validity, error, metrics) update elite archives (feature-based selection) periodically, migrate top elites between islands update prompts for next generation via experience-driven refinement |
6. Empirical Performance and Effectiveness
EvoPH demonstrates state-of-the-art empirical results on established combinatorial optimization benchmarks:
- Traveling Salesman Problem (TGB benchmark): EvoPH reduces Christofides baseline error from 20.64% to 5.17%.
- Bin Packing Problem (BOB benchmark): Evolves best-fit heuristic from a 28.13% error down to 1.65%.
- Consistently outperforms Funsearch, EoH, mEoH, Reevo, and initial heuristic baselines in relative error metrics.
- Ablation studies confirm that eliminating core modules (strategy sampling, prompt evolution, island-based selection) leads to significant degradation, emphasizing module synergy.
- Robustness: EvoPH yields a higher proportion of executable heuristics (lower code-error rates) than non-dynamic prompt baselines.
- Qualitative analysis reveals that prompts evolve toward hierarchical, specialized formats (error recovery separated from optimization), and heuristics transition from simple greedy strategies to sophisticated, global-improvement variants.
These results delineate the efficacy of the PromptEvolver Architecture in avoiding local minima and efficiently traversing the search space.
7. Significance and Broader Context
EvoPH embodies a principled shift in automated algorithm design: not only does the algorithmic space co-evolve, but the process of mutation (via prompt instruction) becomes itself an adaptive object of search. Prompts, under this regime, serve as dynamic controllers whose semantics and operational directives are tied directly to performance outcomes—a marked departure from static prompt templates found in earlier LLM-based automatic programming and metaheuristic frameworks.
The architecture operationalizes population diversity through structured subpopulations and migration, leverages reflective feedback for adaptive mutation, and integrates strategy sampling for both local and global search efficacy. In doing so, it establishes a high watermark for LLM-based automatic algorithm synthesis in combinatorial domains and suggests a generalizable template for other classes of automatic design in AI and computational optimization.
A plausible implication is that similar PromptEvolver Architectures could be extended to broader settings—such as program synthesis, planning, and symbolic regression—where prompt conditioning and diversification are bottlenecks, and where empirical, structured feedback can drive meta-evolutionary improvement.