Papers
Topics
Authors
Recent
Search
2000 character limit reached

MathSticks: Geometric & Algorithmic Frameworks

Updated 28 December 2025
  • MathSticks is a collection of frameworks exploring the combinatorial, geometric, and algorithmic properties of sticks and matchsticks.
  • It features rigorous benchmarks, including visual symbolic compositional reasoning puzzles that integrate visual perception, symbolic manipulation, and arithmetic accuracy.
  • The frameworks also extend to combinatorial knot theory and discrete partition analysis, addressing classic problems like the broken stick and polygon formation challenges.

MathSticks encompasses a collection of mathematical and computational frameworks, models, and benchmarks centered on the combinatorial, geometric, and algorithmic properties of sticks, matchsticks, or straight-line segments. It spans classic probability problems (the "broken stick"), polygon and knot theory representations, and, most recently, systematic machine reasoning challenges involving matchstick manipulations that exercise visual, symbolic, and arithmetic skills in artificial intelligence systems.

1. Visual Symbolic Compositional Reasoning with MathSticks

The \textsc{MathSticks} benchmark is a rigorous suite for evaluating Visual Symbolic Compositional Reasoning (VSCR) via matchstick equation puzzles (Ji et al., 1 Oct 2025). Here, a seven-segment digital display spells out a visually rendered, incorrect sum or difference of the form x=[ab=c]x = [a\,\oplus\,b=c]; the operator {+,}\oplus\in\{+, –\}. The only legal edit is to relocate k{1,2}k\in\{1,2\} physical sticks between segments—strict stick-conservation is enforced, precluding any deletions or insertions. Task success requires three integrated capacities:

  • Visual perception: correctly identify which segments are illuminated across each glyph (segments A0–A6, B0–B6, …, plus operator G0 for +/+/-).
  • Symbolic manipulation: plan sequences of Move(s, t) operations, each denoting the transfer of a stick from source segment ss to target segment tt.
  • Arithmetic consistency: after the modification, the decoded equation AB=CA\oplus B = C must be arithmetically correct.

Formally, for z=[a,b,g,c,d,e,f]z = [a,b,g,c,d,e,f], A=10max(a,0)+bA = 10\cdot \max(a,0)+b, B=10max(c,0)+dB = 10\cdot \max(c,0)+d, C=10max(e,0)+fC = 10\cdot \max(e,0)+f, and the solution set is constructed via enumerative search over all legal one- and two-stick moves. Datasets consist of over 1.4 million distinct puzzles, stratified by digit scale, move complexity, solution multiplicity, and operator flipping.

2. MathSticks in Combinatorial Knot and Polygon Theory

Separately, "MathSticks" terminology also appears in knot theory, notably in the context of stick numbers and polygonal knot representations (Cantarella et al., 25 Aug 2025). Here, a "stick" refers to a straight segment forming part of a non-self-intersecting, piecewise-linear closed curve in R3\mathbb{R}^3, representing a particular knot type KK. The central invariant, the stick number s(K)s(K), is the minimal number of segments required for any polygonal realization ambient isotopic to KK:

s(K)=min{nPwithnedges andknot-type(P)=K}s(K) = \min\{n\mid \exists\,P\,\text{with}\,n\,\text{edges and}\,\text{knot-type}(P) = K\}

In polygon forming problems, the "broken stick" and its variants address the conditions under which arbitrary subdivisions of a segment can form valid polygons, leading to discrete analogs of classical probability results and partition enumeration via generating functions (Verreault, 2021).

3. Algorithmic Construction of MathSticks Datasets and Benchmarks

The construction of the \textsc{MathSticks} benchmark follows a two-phase enumeration and rendering approach (Ji et al., 1 Oct 2025). First, all candidate configurations zz are tested for whether AB=CA\oplus B = C is already satisfied; otherwise, all reachable states S1(z)S_1(z) and S2(z)S_2(z) from one- or two-stick moves are generated using comprehensive lookup tables T1,T2T_1, T_2. Filtering retains only those resulting in mathematically valid equations. Solutions are tagged by:

  • Digit scale: Levels 1–4, defined by digit/wide operand space.
  • Move complexity: whether one or two moves (or both) suffice.
  • Solution multiplicity: unique or multiple valid corrections.
  • Operator flipping: whether G0G_0 (the +/+/- segment) changes.

Images are synthesized from digit templates (PNG/SVG), with all moves precisely tracked. The final dataset (1,411,388 solvable puzzles) is heavily skewed toward high-difficulty instances (Level 4), and a carefully curated stratified test set (400 puzzles) balances all major axes.

In knot theory, polygonal knots with minimal sticks are constructed via simulated annealing and lattice-based or off-lattice moves, subject to ambient isotopy constraints and efficient self-intersection checks. Systematic improvement of upper bounds on stick numbers exploits both local moves (BFACF-style) and more global fold and triangle moves (Cantarella et al., 25 Aug 2025).

4. Evaluation Protocols and Empirical Findings

The \textsc{MathSticks} benchmark assesses model performance in two regimes:

  • Text-prompted: the equation string is presented with the image. Models skip OCR and focus on symbolic planning.
  • Pure-visual: only the matchstick image is shown. Models must handle full visual parsing and compositional reasoning.

The metric is exact-match solution accuracy: Move sequences must match one of the (potentially multiple) ground truth Move(·)s.

Empirical results spanning 14 vision-LLMs indicate substantial limitations. Closed-source systems (o3, Gemini 2.5 Pro/Flash, etc.) achieve up to 60% mean accuracy in the text-prompted regime, but no more than 38.5% in visual-only tasks; open-source models collapse to 0% in both. Human participants, conversely, achieve 91.7% mean accuracy under visual-only conditions. Breakdown by move complexity and operator-flipping reveals that two-stick and sign-flip instances are especially challenging; most errors stem from perception deficits, planning violations, arithmetic lapses, mishandling of G0G_0, or incorrect output format.

In knot stick number computations, simulated annealing produces new upper bounds on s(K)s(K), conclusively determining s(K)s(K) for numerous 11- and 13-crossing prime knots previously unresolved, and providing comprehensive tabulations for all prime knots through c=13c=13 crossings (Cantarella et al., 25 Aug 2025).

5. Discrete Partition Analysis and the "Broken Stick" Problem

The classic "broken stick" problem, generalized in (Verreault, 2021), investigates the probability that kk pieces (from nn subdivisions of a stick) can or cannot form a kk-gon. The discrete approach encodes the polygon-forming inequalities as Diophantine constraints on the piece lengths a1,,ana_1,\dots,a_n, deriving a generating function via MacMahon partition analysis:

G(q)=i=k2n11qfk1(i)j=2k211qhk1(j)G(q) = \prod_{i=k-2}^{n}\frac{1}{1-q^{f_{k-1}(i)}} \prod_{j=2}^{k-2}\frac{1}{1-q^{h_{k-1}(j)}}

where fr(i)f_{r}(i) and hr(j)h_{r}(j) involve partial sums of the rr-step Fibonacci numbers. The limiting probability that no kk-subset of the nn pieces forms a kk-gon is then

$P(\text{no $k$-gon}) = \frac{n!} {\left(\prod_{i=k-2}^{n}f_{k-1}(i)\right)\left(\prod_{j=2}^{k-2}h_{k-1}(j)\right)}.$

This extends and unifies discrete and continuous broken stick probability models, including the classical 1n/2n11-n/2^{n-1} formula for nn-gons.

6. Limitations, Open Problems, and Future Directions

Benchmark analyses and empirical tests reveal persistent capability gaps in contemporary vision-LLMs on MathSticks, especially for VSCR tasks that integrate low-level perception with symbolic manipulation and combinatorial arithmetic (Ji et al., 1 Oct 2025). Directions for closing these gaps include:

  • Developing neuro-symbolic architectures that encode segment–graph structure with explicit symbolic solvers.
  • Pretraining on matchstick-style diagram edits or constraint-satisfaction domain datasets.
  • Augmenting chain-of-thought prompting with candidate move enumeration and forward simulation.
  • Multi-modal finetuning using gold Move(·) demonstrations to instill strict conservation and arithmetic awareness.

Problems in knot stick number theory remain, such as proving s(K)c(K)s(K)\leq c(K) for all c(K)12c(K)\geq 12, equilateral stick number equivalence, or enumerating the distribution of s(K)s(K) for higher crossing numbers (Cantarella et al., 25 Aug 2025). Discrete partition analysis opens further exploration toward non-uniform break distributions, higher-dimensional "broken simplex" analogs, and asymptotic connections to continuous models (Verreault, 2021).

In sum, MathSticks links combinatorial geometry, symbolic reasoning, computational topology, and AI benchmarking in a cohesive mathematical and empirical framework, with significant implications for the next generation of machine reasoning and mathematical modeling.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MathSticks.