Self-Improving Language Models with Bidirectional Evolutionary Search

This presentation explains how Bidirectional Evolutionary Search addresses fundamental limitations in language model training and inference by combining evolutionary recombination operators with backward goal decomposition, enabling models to discover solutions beyond their native probability distribution while providing dense, interpretable feedback for efficient self-improvement.
Script
Most language models get stuck searching only where their training tells them to look, trapped in high-probability regions while correct answers hide in unexpected places. The authors introduce Bidirectional Evolutionary Search, a framework that breaks models free by recombining solution paths like genetic crossover and decomposing hard problems into verifiable sub-goals.
The forward search uses four evolutionary operators beyond simple expansion. Combination concatenates different solution tails sharing a common start. Deletion condenses reasoning by removing unnecessary steps. Translocation swaps individual steps between trajectories. Crossover splices the beginning of one path onto the end of another, creating candidates the model would never generate autoregressively.
The backward search recursively decomposes the global goal into fine-grained sub-goals, building an explicit hierarchy. Each sub-goal gets its own verifier providing continuous intermediate feedback, which means the system knows exactly which reasoning steps succeeded and which failed. This dense signal exponentially reduces the samples needed to solve hard problems, transforming a multiplicative search into efficient sub-goal collection.
The theoretical foundation reveals why this matters. Sequential expansion confines candidates to a narrow entropy shell, a typical set bounded in log-probability space. Evolutionary recombination breaks inter-block dependence, producing trajectories with higher surprise that escape the shell. A non-negligible fraction of evolved candidates reaches low-probability regions where correct solutions actually live, regions standard search never explores.
On logical reasoning benchmarks with Gemma models, Bidirectional Evolutionary Search demonstrates consistent accuracy gains while baseline methods stagnate on hard samples. For multi-hop reasoning with Llama models, the framework achieves up to 3.8% accuracy improvements, with agents learning active search strategies rather than degenerate guessing. Evolution operators, especially translocation, enable recombination of partial reasoning chains into correct solutions.
Bidirectional Evolutionary Search fundamentally reimagines how models discover solutions by escaping distributional constraints and providing interpretable structure through explicit goal decomposition. The framework scales efficiently for both training and inference, opening pathways to more capable reasoning systems. Explore the full paper and create your own research videos at EmergentMind.com.