AlphaResearch: Accelerating New Algorithm Discovery with Language Models (2511.08522v1)

Published 11 Nov 2025 in cs.CL

Abstract: LLMs have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaResearch}, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct \textbf{AlphaResearchComp}, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the \emph{``packing circles''} problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.

Summary

The paper introduces AlphaResearch, a dual research agent integrating idea generation, execution-based verification, and simulated peer review to accelerate algorithm discovery.
It employs a reward model trained on historical peer reviews with a 72% accuracy, outperforming traditional verification methods.
AlphaResearch demonstrates practical superiority by surpassing human benchmarks on packing circles, showcasing its potential in open-ended research tasks.

AlphaResearch: Accelerating New Algorithm Discovery with LLMs

The paper introduces AlphaResearch, an autonomous agent that leverages LLMs to accelerate the discovery of new algorithms in open-ended problems. This agent integrates idea generation, a dual research environment, and a unique combination of execution-based verification and simulated peer review rewards to achieve this goal.

Motivation and Background

The motivation for developing AlphaResearch stems from the limitations observed in previous approaches to algorithm discovery using LLMs. Traditional methods often suffer from biased verification and misalignment issues when evaluating machine-generated ideas, which restrict the exploration potential of novel concepts. Prior attempts like AlphaEvolve focus on execution-based verification systems that can validate code's technical correctness but often lack scientific significance, while idea generation systems evaluated by LLM judges might propose innovative concepts that are computationally infeasible.

AlphaResearch Approach

AlphaResearch aims to combine feasibility and innovation by integrating a dual research environment. The agent iteratively performs three steps: proposing new ideas, verifying these ideas within this dual environment, and optimizing the research proposals for enhanced performance.

Figure 1: Comparison of OpenEvolve, ShinkaEvolve, and AlphaResearch on Packing Circles (n=26) problems. AlphaResearch outperforms others.

Dual Research Environment

This environment synergizes real-world peer review with execution-based rewards. The simulated peer review environment trains a reward model (AlphaResearch-RM-7B) using historical peer-review data, optimizing idea candidates beyond solely coding-based approaches.

Figure 2: The launch of AlphaResearch involves training reward models using peer-reviewed records and preparing initial proposals.

Execution-Based Verification

Execution-based rewards are facilitated by an automatic verification environment that stresses the program execution aspect of algorithm discovery. Programs are generated and evaluated, storing promising sequences for optimization in subsequent rounds.

AlphaResearchComp Benchmark

The AlphaResearchComp benchmark was constructed to compare AlphaResearch with human researchers across 8 open-ended research tasks. This benchmark tests the agent's discovery capabilities against human-best results in fields such as geometry, combinatorial optimization, and harmonic analysis.

AlphaResearch shows a 2/8 win rate against expert humans, notably improving solutions for "Packing Circles" problems, achieving a sum of radii surpassing human benchmarks.

Figure 3: Execution-based reward trends for AlphaResearch on packing circles (n=26) and third autocorrelation inequality problems.

Implementation Details

AlphaResearch's iterative algorithm proposes ideas influenced by a peer-review-informed reward model and adjusts program generation subsequently attempting execution-based optimization. The agent demonstrates a trajectory of enhancing ideas and execution results, optimizing towards novel algorithmic solutions.

Reward Model

AlphaResearch-RM-7B was trained on peer reviews from past ICLR conferences, filtering fresh ideas that surpass prior results in novelty and impact. The reward model shows a classification accuracy of 72% on unseen ICLR ideas, outperforming baseline models.

Figure 4: Impact of peer review environment on execution results, highlighting AlphaResearch-RM-7B's filtering effectiveness.

Analysis and Ablations

AlphaResearch's comparison with OpenEvolve and execution-only research agents shows superior performance through the dual-research setup. The agent's novel proposal process drives effective idea verification and optimization.

Figure 5: Example of AlphaResearch formatted task.

Case Study

A "Packing Circles" case paper illustrates AlphaResearch's process, starting with initial proposals evolving through systematic idea verification and optimal program execution.

Conclusion

AlphaResearch establishes a pathway for using LLMs to push the boundaries of human research, providing insights into future research agent developments. Limitations exist in expanding real-world problem applicability and improving reward model scale and sophistication.