Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

This presentation examines a breakthrough framework for cost-efficient evolutionary inference in large language models without external verifiers. Squeeze Evolve orchestrates multiple models by routing computational work based on internal confidence signals, achieving up to 3.3 times cost reduction while matching or exceeding single-model accuracy across mathematics, coding, multimodal vision, and scientific discovery tasks. The framework demonstrates that strategic initialization with strong models combined with confidence-based routing to cheaper models fundamentally shifts the cost-capability frontier for test-time scaling.
Script
Evolutionary inference for large language models hits two walls simultaneously: diversity collapse that kills search capacity, and compute costs that spiral hundreds of times beyond normal inference. A new framework called Squeeze Evolve cracks both problems at once by routing work strategically across multiple models based on internal confidence signals.
Without external verifiers to guide search, existing evolutionary pipelines rapidly lose diversity as candidate populations converge onto similar solutions. At the same time, running expensive models through multiple recombination loops costs hundreds of times more than standard inference, making the approach impractical for real deployment.
The authors realized that not all evolutionary steps require the same model capability.
The breakthrough is simple but profound: use the expensive model only where it matters most, which turns out to be population initialization. After that, route candidate groups to cheap or expensive models based on internal confidence scores. High confidence groups where candidates agree go to weak models or simple voting. Low confidence groups with disagreement get the strong model. This single routing decision preserves diversity while slashing costs.
The results are striking across modalities. On multimodal vision reasoning, pairing a text-only model with a vision-capable model cuts costs by 2.7 times at matched accuracy. The text-only model never even sees the images after initialization, yet it successfully recombines visual reasoning trajectories because initialization quality dominates final performance. This finding reveals something fundamental about how evolutionary search operates in these systems.
The framework scales across mathematics, coding, multimodal vision, and open-ended scientific discovery. On visual reasoning benchmarks, it achieves state-of-the-art accuracy while dramatically reducing cost. Throughput gains are equally impressive: up to 10 times faster under fixed compute budgets. Perhaps most surprisingly, on circle packing discovery where external verification is trivial, internal confidence alone matches the performance of systems that execute and verify every candidate program.
Squeeze Evolve shows that the future of test-time scaling is not about bigger single models, but smarter orchestration of diverse capabilities. Visit EmergentMind.com to explore the full paper and create your own research videos.