MathSticks: Geometric & Algorithmic Frameworks
- MathSticks is a collection of frameworks exploring the combinatorial, geometric, and algorithmic properties of sticks and matchsticks.
- It features rigorous benchmarks, including visual symbolic compositional reasoning puzzles that integrate visual perception, symbolic manipulation, and arithmetic accuracy.
- The frameworks also extend to combinatorial knot theory and discrete partition analysis, addressing classic problems like the broken stick and polygon formation challenges.
MathSticks encompasses a collection of mathematical and computational frameworks, models, and benchmarks centered on the combinatorial, geometric, and algorithmic properties of sticks, matchsticks, or straight-line segments. It spans classic probability problems (the "broken stick"), polygon and knot theory representations, and, most recently, systematic machine reasoning challenges involving matchstick manipulations that exercise visual, symbolic, and arithmetic skills in artificial intelligence systems.
1. Visual Symbolic Compositional Reasoning with MathSticks
The \textsc{MathSticks} benchmark is a rigorous suite for evaluating Visual Symbolic Compositional Reasoning (VSCR) via matchstick equation puzzles (Ji et al., 1 Oct 2025). Here, a seven-segment digital display spells out a visually rendered, incorrect sum or difference of the form ; the operator . The only legal edit is to relocate physical sticks between segments—strict stick-conservation is enforced, precluding any deletions or insertions. Task success requires three integrated capacities:
- Visual perception: correctly identify which segments are illuminated across each glyph (segments A0–A6, B0–B6, …, plus operator G0 for ).
- Symbolic manipulation: plan sequences of Move(s, t) operations, each denoting the transfer of a stick from source segment to target segment .
- Arithmetic consistency: after the modification, the decoded equation must be arithmetically correct.
Formally, for , , , , and the solution set is constructed via enumerative search over all legal one- and two-stick moves. Datasets consist of over 1.4 million distinct puzzles, stratified by digit scale, move complexity, solution multiplicity, and operator flipping.
2. MathSticks in Combinatorial Knot and Polygon Theory
Separately, "MathSticks" terminology also appears in knot theory, notably in the context of stick numbers and polygonal knot representations (Cantarella et al., 25 Aug 2025). Here, a "stick" refers to a straight segment forming part of a non-self-intersecting, piecewise-linear closed curve in , representing a particular knot type . The central invariant, the stick number , is the minimal number of segments required for any polygonal realization ambient isotopic to :
In polygon forming problems, the "broken stick" and its variants address the conditions under which arbitrary subdivisions of a segment can form valid polygons, leading to discrete analogs of classical probability results and partition enumeration via generating functions (Verreault, 2021).
3. Algorithmic Construction of MathSticks Datasets and Benchmarks
The construction of the \textsc{MathSticks} benchmark follows a two-phase enumeration and rendering approach (Ji et al., 1 Oct 2025). First, all candidate configurations are tested for whether is already satisfied; otherwise, all reachable states and from one- or two-stick moves are generated using comprehensive lookup tables . Filtering retains only those resulting in mathematically valid equations. Solutions are tagged by:
- Digit scale: Levels 1–4, defined by digit/wide operand space.
- Move complexity: whether one or two moves (or both) suffice.
- Solution multiplicity: unique or multiple valid corrections.
- Operator flipping: whether (the segment) changes.
Images are synthesized from digit templates (PNG/SVG), with all moves precisely tracked. The final dataset (1,411,388 solvable puzzles) is heavily skewed toward high-difficulty instances (Level 4), and a carefully curated stratified test set (400 puzzles) balances all major axes.
In knot theory, polygonal knots with minimal sticks are constructed via simulated annealing and lattice-based or off-lattice moves, subject to ambient isotopy constraints and efficient self-intersection checks. Systematic improvement of upper bounds on stick numbers exploits both local moves (BFACF-style) and more global fold and triangle moves (Cantarella et al., 25 Aug 2025).
4. Evaluation Protocols and Empirical Findings
The \textsc{MathSticks} benchmark assesses model performance in two regimes:
- Text-prompted: the equation string is presented with the image. Models skip OCR and focus on symbolic planning.
- Pure-visual: only the matchstick image is shown. Models must handle full visual parsing and compositional reasoning.
The metric is exact-match solution accuracy: Move sequences must match one of the (potentially multiple) ground truth Move(·)s.
Empirical results spanning 14 vision-LLMs indicate substantial limitations. Closed-source systems (o3, Gemini 2.5 Pro/Flash, etc.) achieve up to 60% mean accuracy in the text-prompted regime, but no more than 38.5% in visual-only tasks; open-source models collapse to 0% in both. Human participants, conversely, achieve 91.7% mean accuracy under visual-only conditions. Breakdown by move complexity and operator-flipping reveals that two-stick and sign-flip instances are especially challenging; most errors stem from perception deficits, planning violations, arithmetic lapses, mishandling of , or incorrect output format.
In knot stick number computations, simulated annealing produces new upper bounds on , conclusively determining for numerous 11- and 13-crossing prime knots previously unresolved, and providing comprehensive tabulations for all prime knots through crossings (Cantarella et al., 25 Aug 2025).
5. Discrete Partition Analysis and the "Broken Stick" Problem
The classic "broken stick" problem, generalized in (Verreault, 2021), investigates the probability that pieces (from subdivisions of a stick) can or cannot form a -gon. The discrete approach encodes the polygon-forming inequalities as Diophantine constraints on the piece lengths , deriving a generating function via MacMahon partition analysis:
where and involve partial sums of the -step Fibonacci numbers. The limiting probability that no -subset of the pieces forms a -gon is then
$P(\text{no $k$-gon}) = \frac{n!} {\left(\prod_{i=k-2}^{n}f_{k-1}(i)\right)\left(\prod_{j=2}^{k-2}h_{k-1}(j)\right)}.$
This extends and unifies discrete and continuous broken stick probability models, including the classical formula for -gons.
6. Limitations, Open Problems, and Future Directions
Benchmark analyses and empirical tests reveal persistent capability gaps in contemporary vision-LLMs on MathSticks, especially for VSCR tasks that integrate low-level perception with symbolic manipulation and combinatorial arithmetic (Ji et al., 1 Oct 2025). Directions for closing these gaps include:
- Developing neuro-symbolic architectures that encode segment–graph structure with explicit symbolic solvers.
- Pretraining on matchstick-style diagram edits or constraint-satisfaction domain datasets.
- Augmenting chain-of-thought prompting with candidate move enumeration and forward simulation.
- Multi-modal finetuning using gold Move(·) demonstrations to instill strict conservation and arithmetic awareness.
Problems in knot stick number theory remain, such as proving for all , equilateral stick number equivalence, or enumerating the distribution of for higher crossing numbers (Cantarella et al., 25 Aug 2025). Discrete partition analysis opens further exploration toward non-uniform break distributions, higher-dimensional "broken simplex" analogs, and asymptotic connections to continuous models (Verreault, 2021).
In sum, MathSticks links combinatorial geometry, symbolic reasoning, computational topology, and AI benchmarking in a cohesive mathematical and empirical framework, with significant implications for the next generation of machine reasoning and mathematical modeling.