Determination of the fifth Busy Beaver value (2509.12337v1)
Abstract: We prove that $S(5) = 47,176,870$ using the Coq proof assistant. The Busy Beaver value $S(n)$ is the maximum number of steps that an $n$-state 2-symbol Turing machine can perform from the all-zero tape before halting, and $S$ was historically introduced by Tibor Rad\'o in 1962 as one of the simplest examples of an uncomputable function. The proof enumerates $181,385,789$ Turing machines with 5 states and, for each machine, decides whether it halts or not. Our result marks the first determination of a new Busy Beaver value in over 40 years and the first Busy Beaver value ever to be formally verified, attesting to the effectiveness of massively collaborative online research (bbchallenge$.$org).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper proves a famous number in computer science called the fifth Busy Beaver value: S(5) = 47,176,870. That means they showed that among all tiny, 5‑state “Turing machines” that start on a blank tape and eventually stop, the one that runs the longest takes exactly 47,176,870 steps before halting. They didn’t just test machines: they created a formal, computer-checked proof using a tool called Coq, so other scientists can trust it completely.
What questions did the researchers want to answer?
- What is the exact maximum number of steps any 5‑state, 2‑symbol Turing machine can take before it stops, starting from an all‑zero (blank) tape?
- Can we verify this answer in a way that’s guaranteed correct, using a proof assistant (a program that checks math proofs)?
- Can collaborative online research and careful programming solve a problem that many people thought would be too hard?
How did they do it? (Methods explained simply)
Think of a Turing machine as a very tiny robot sitting on an infinite strip of paper (the “tape”). Each square on the tape has a symbol (like 0 or 1). The robot has a small “state” inside it (like A, B, C, …). At every step, it: 1) reads the symbol under its head, 2) writes a new symbol, 3) moves left or right, 4) switches to a new state, following its rule table.
The Busy Beaver game asks: among all such robots with exactly n states and 2 symbols, which one runs the longest before it stops? That longest time is called S(n).
To find S(5), the team:
- Generated all the relevant 5‑state, 2‑symbol machines, carefully avoiding duplicates using a smart recipe called “Tree Normal Form.” This shrank the search from roughly 16 quadrillion possibilities down to about 181 million.
- For each of these ~181,385,789 machines, they tried to decide: will it halt (stop) or run forever? Simply simulating step-by-step is sometimes enough—but not always, because “runs forever” can be tricky to prove.
- They built and proved correct a toolbox of “deciders.” A decider is like a detective: it reasons about a machine’s behavior without needing to simulate forever. Their main framework was called Closed Tape Language (CTL), which, in simple terms, creates a safe “fence” around all the kinds of tape patterns a machine can reach and shows that none of those patterns lead to halting. If the fence is closed under the machine’s rules and contains no halting point, the machine can’t halt.
- A very small number of especially complicated machines (13 of them) needed careful, custom proofs (“Sporadic Machines”). The team wrote detailed arguments for each and checked them in Coq.
- They performed all of this inside the Coq proof assistant. Coq is like a super‑strict math teacher that won’t accept any steps unless they’re proven. They wrote the algorithms, proved those algorithms are correct, ran them, and let Coq certify the final results. This approach is called “proof by reflection.”
They also used a community platform (bbchallenge.org) where many contributors shared ideas, code, and checks. This collaboration helped design better deciders and speed up progress.
What did they find and why does it matter?
Main results:
- They proved S(5) = 47,176,870. That means the known 5‑state champion machine really is the longest‑running one before stopping.
- This is the first new Busy Beaver value determined in over 40 years.
- It’s the first Busy Beaver value ever to be formally verified by a proof assistant, which is a big deal for trust and reproducibility in mathematics and computer science.
- They also formally verified several other small Busy Beaver values (including previous ones) and solved another class: the case with 2 states and 4 symbols.
Why it’s important:
- Busy Beaver numbers grow extremely fast and are connected to deep ideas, like the “halting problem” (the impossibility of having a program that always decides if any program will halt).
- By proving S(5) with fully checked methods, the team showed that large, tricky, exploratory proofs can be done in a reliable way.
- Their methods and tools can be used to push the frontier for bigger cases (like 6‑state machines), where the problems start to touch famous unsolved math questions.
What does this mean for the future?
- The approach—smart enumeration, powerful deciders, and formal verification—creates a strong foundation for tackling harder Busy Beaver cases.
- For 6‑state machines and beyond, some specific machines look “cryptic” (the paper calls them “Cryptids”): deciding whether they halt may be as hard as long‑standing open problems in math. This makes the Busy Beaver game an exciting way to generate new, meaningful challenges.
- The fully checked dataset and methods are great testbeds for AI systems that try to do mathematical reasoning.
- The project also shows how large, open, online collaborations can successfully produce serious research, similar to open‑source software—just with math proofs.
In short, this work determined S(5) exactly, proved it in a way everyone can trust, and opened doors to exploring even deeper problems where simple‑looking machines can hide surprising complexity.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a consolidated list of unresolved issues the paper highlights or implies, framed to be concrete and actionable for future researchers.
- Complete, fully verified TNF enumeration for 6-state 2-symbol machines is unfinished; scale the Coq implementation to cover ~33 billion machines and formally guarantee no winners are missed.
- Determine exact values for S(6), S(3,3), and S(2,5); current status includes substantial holdouts and suspected champions but no formal resolutions.
- Prove or refute halting for identified 6-state “Cryptids,” especially:
- Antihydra: settle whether its Collatz-like odd/even imbalance conjecture implies non-halting from all-zero tape.
- “BMO Problem 1” machine: establish whether there exists an index i with a_i = b_i (equivalently, that the machine halts).
- The probviously halting 3-state 3-symbol machine expected to vastly extend current lower bounds: provide a rigorous halting proof and resulting exact S(3,3) value.
- Develop CTL deciders (and other techniques) that systematically subsume the 13 Sporadic Machines, eliminating the need for bespoke, machine-specific proofs in similar future pipelines.
- Generalize and formally verify all deciders used informally by the community (e.g., FAR) within Coq, and quantify their coverage and failure modes across larger machine classes.
- Provide a rigorous methodology to turn “probviously” (probabilistic heuristic) non-halting/halting arguments into formal proofs (e.g., via supermartingales, drift conditions, or invariants), and benchmark their success on Cryptids.
- Establish nontrivial upper bounds for S(6) (beyond decidability barriers), or produce verified acceleration techniques capable of detecting halting beyond current champions.
- Complete a formal, quantitative coverage analysis of CTL: for each decider, measure what proportion of the TNF search space it decides, characterize undecided patterns, and derive new deciders targeted at specific residual families.
- Produce a comprehensive, formal taxonomy (“zoology”) of 5-state machine behaviors (e.g., counters, Gray-code generators, large chaotic pre-loopers), including precise invariants, templates, and automated recognition tools.
- Characterize all 5-state counters: give a complete classification of counter architectures achievable within 5 states, with correctness proofs and tight bounds on their step counts and tape growth.
- Find and formally prove the maximal loop length among 5-state machines without halting transitions, and develop scalable methods to certify enormous eventual loops (e.g., >1051 steps) without full simulation.
- Investigate universality at 5 states in this model: either construct a 5-state universal Turing machine or prove impossibility under the paper’s conventions (2 symbols, bi-infinite tape, undefined-transition halting).
- Extend TNF completeness proofs and enumeration machinery to multi-symbol classes beyond (2,4), ensuring that reductions do not exclude potential winners in classes like (3,3), (3,4), and (4,3).
- Systematically paper Busy Beaver values under alternative models discussed (quadruple TMs, turmites, lambda calculus): formalize translations, preserve semantics, and compute or bound corresponding S(n) variants.
- Explore Busy Beaver values from non-all-zero initial tapes (and other inputs): define precise variants, develop enumeration/decider pipelines, and determine whether small-state universality or extreme behaviors arise.
- Cross-verify Coq-BB5 in other proof assistants (e.g., Lean, Isabelle) to strengthen trust, and aim for a proof that avoids additional axioms (e.g., remove reliance on functional_extensionality_dep via alternative encoding).
- Provide human-understandable mechanistic explanations for extremely large halters (including the 5-state winner), distilling repeated macro-configurations and proving macro-step lemmas that generalize to larger classes.
- Improve parallelization, caching, and certified acceleration in Coq for large-scale enumeration/proof-by-reflection workloads; document resource footprints needed to scale to 6-state and 3-symbol classes.
- Establish a standardized benchmark suite (including Cryptids and near-threshold machines) for evaluating AI theorem provers and program analyzers on Busy Beaver-style problems, with ground-truth formal certificates.
Collections
Sign up for free to add this paper to one or more collections.