Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 73 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 39 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 85 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Learning Conjecturing from Scratch (2503.01389v1)

Published 3 Mar 2025 in cs.AI, cs.LG, cs.LO, cs.NE, and cs.SC

Abstract: We develop a self-learning approach for conjecturing of induction predicates on a dataset of 16197 problems derived from the OEIS. These problems are hard for today's SMT and ATP systems because they require a combination of inductive and arithmetical reasoning. Starting from scratch, our approach consists of a feedback loop that iterates between (i) training a neural translator to learn the correspondence between the problems solved so far and the induction predicates useful for them, (ii) using the trained neural system to generate many new induction predicates for the problems, (iii) fast runs of the z3 prover attempting to prove the problems using the generated predicates, (iv) using heuristics such as predicate size and solution speed on the proved problems to choose the best predicates for the next iteration of training. The algorithm discovers on its own many interesting induction predicates, ultimately solving 5565 problems, compared to 2265 problems solved by CVC5, Vampire or Z3 in 60 seconds.

Summary

The paper presents a self-learning method using an iterative feedback loop and neural translation to synthesize induction predicates for solving difficult OEIS induction problems.
This approach autonomously discovered predicates enabling the resolution of 5,565 problems, substantially exceeding the 2,265 solved by state-of-the-art systems like CVC5, Vampire, or Z3 on the same benchmark.
The method demonstrates the potential of self-learning and neural techniques to advance automated theorem proving, setting a new baseline and suggesting applications in areas like software verification.

The paper "Learning Conjecturing from Scratch" presents a self-learning method for synthesizing induction predicates to address induction problems derived from the OEIS (Online Encyclopedia of Integer Sequences). These problems, numbering 16,197, are challenging for existing SMT (Satisfiability Modulo Theories) and ATP (Automated Theorem Proving) systems due to the necessity of both inductive and arithmetic reasoning.

The proposed approach is based on an iterative feedback loop comprising four key steps:

A neural translator is trained to identify the relationship between solved problems and the induction predicates beneficial for them.
The trained neural system generates a multitude of new induction predicates for the problems.
The z3 SMT solver attempts to prove the problems using these predicates in rapid succession.
Heuristics, considering predicate size and solution speed, help select optimal predicates for the subsequent training phase.

From a starting point of no prior knowledge, the approach autonomously discovers interesting induction predicates, leading to the resolution of 5,565 problems. This exceeds the 2,265 problems that current systems like CVC5, Vampire, or Z3 solve in 60 seconds.

Key components of the approach include:

Dataset Utilization: Problems are taken from the OEIS, where sequences often have various proposed explanations. Proving their equivalence requires automated theorem proving capabilities, which can be critically improved with the ability to routinely prove such conjectures.
Benchmark Evaluation: The OEIS ATP benchmark comprises SMT problems that verify the equivalence of two programs generating identical sequences.
Predicate Synthesis: The work focuses on synthesizing instances of second-order induction predicates that can be used with the Z3 solver to tackle OEIS problems. This synthesis employs a grammar for creating predicates and techniques for initial brute-force generation of diverse predicates.
Training a Neural System: Using neural machine translation (NMT) for generating propositions, the system cycles through training and evaluation phases, thus refining its inductive predicates over time. The NMT models utilize sequence-to-sequence translation, effectively capturing and leveraging patterns.
Performance Metrics: The iterative self-learning process results in a formidable machine model, which can prove 5,372 problems in a maximum of 48 seconds, showcasing significant advancements in both the speed and breadth of problem-solving.

The paper highlights the benefits of self-learning and neural techniques in automated theorem proving, particularly through the synthesis of induction predicates. This approach not only sets a new baseline in the number of problems solved but also exemplifies the potential for expanding these techniques to broader applications, such as software verification and loop invariant verification in larger industrial contexts.