Can large language models solve new, simple conjectures in advanced mathematics?

Investigate whether current or near-future large language models, operating without external tools, can produce correct proofs for novel, simple conjectures in advanced mathematical domains, thereby meeting the proposed Gödel Test.

Background

The paper proposes the Gödel Test as a benchmark for AI systems: proving very simple, previously unsolved conjectures in more advanced mathematical areas. Despite strong claims about competition performance, the authors highlight an unresolved question about models’ ability to handle novel conjectures that require mathematical maturity.

Their experiments with GPT-5 provide mixed results, motivating a systematic investigation of this capability.

References

Yet it remains unclear whether LLMs can solve new, simple conjectures in more advanced areas of mathematics.

— Gödel Test: Can Large Language Models Solve Easy Conjectures? (2509.18383 - Feldman et al., 22 Sep 2025) in Abstract

Can large language models solve new, simple conjectures in advanced mathematics?

Background

References

Related Problems