ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution (2509.19349v1)

Published 17 Sep 2025 in cs.CL and cs.LG

Abstract: We introduce ShinkaEvolve: a new open-source framework leveraging LLMs to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.

Summary

The paper introduces novel LLM-guided parent sampling, code novelty rejection, and a bandit-based ensemble selection strategy to boost sample efficiency.
It demonstrates superior performance in circle packing tasks by achieving optimal solutions with fewer than 150 program evaluations.
It paves the way for open-source, versatile evolutionary frameworks capable of addressing diverse computational challenges.

ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution

ShinkaEvolve introduces a novel framework aimed at advancing program evolution leveraging LLMs. This framework addresses significant inefficiencies in current methods by introducing key algorithmic innovations that lead to enhanced sample efficiency. The paper evaluates these innovations across several complex tasks, demonstrating substantial improvements over existing approaches.

ShinkaEvolve Framework

ShinkaEvolve employs a structured evolutionary strategy designed to efficiently explore program space. The method integrates three core innovations: a parent selection technique that balances exploration and exploitation, code novelty rejection sampling, and a bandit-based LLM ensemble selection strategy.

Figure 1: High-level overview of ShinkaEvolve. Left: The ShinkaEvolve framework constructs an archive of evaluated programs, rejection-samples new programs, and evaluates their fitness. Right: ShinkaEvolve provides a sample efficient alternative to AlphaEvolve and outperforms its Circle Packing solution.

The combination of these techniques allows ShinkaEvolve to outperform existing frameworks, particularly in the context of the circle packing problem, where it achieved optimal solutions with significantly fewer program evaluations.

Parent and Inspiration Sampling

A critical component of ShinkaEvolve's efficiency is its parent sampling mechanism, which strategically balances between exploiting known high-quality solutions and exploring new areas of the solution space. This is done using power law and weighted sampling to manage the trade-off between exploration and exploitation.

Figure 2: ShinkaEvolve Parent Sampling. The strategies range from pure exploration (uniform sampling) to pure exploitation (hill-climbing) to a combination of performance and novelty.

Program Mutation and Novelty Rejection

To foster diverse program generation, ShinkaEvolve applies LLM-guided mutations with novel rejection sampling techniques. Program diversity is enhanced via an embedding-based novelty rejection mechanism that prevents the proliferation of redundant solutions, ensuring only meaningfully novel programs are selected for further evaluation.

Figure 3: ShinkaEvolve Program Novelty Rejection Sampling. ShinkaEvolve embeds mutable code snippets, computes similarities across the archive; if the maximal score exceeds a threshold, another LLM is queried to assess whether the program is meaningfully novel.

Results and Analysis

The framework was tested on several tasks, demonstrating substantial improvements in efficiency and outcomes compared to benchmarks. Notably, on the circle packing task, ShinkaEvolve achieved state-of-the-art solutions using significantly fewer program evaluations.

Figure 4: ShinkaEvolve on Circle Packing Task. Left: ShinkaEvolve outperforms AlphaEvolve's solution within less than 150 program evaluations. Right: ShinkaEvolve's program evolution tree demonstrates the iterative composition of stepping stones into high-performing solutions.

Furthermore, the application to improving ALE-Bench solutions and designing effective Mixture-of-Experts (MoE) load balancing loss functions underscores ShinkaEvolve’s capability to handle diverse tasks effectively.

Implications and Future Work

ShinkaEvolve demonstrates the potential for efficient program evolution frameworks to advance various computational problems significantly. By reducing the sample complexity and providing open-source accessibility, it paves the way for broader adoption and further innovation in evolutionary computational strategies.

Future work could focus on enhancing the open-endedness of the system by enabling autonomous task specification and continuing the integration of sophisticated meta-learning strategies to realize the full potential of continuous discovery.

Conclusion

ShinkaEvolve represents a substantial step forward in the domain of program evolution frameworks by integrating innovative sampling and mutation strategies to achieve superior performance. Its open-source nature and improved sample efficiency democratize access to advanced computational tools, fostering further advancements in the field.