Identify a precise MathLib “macro set” that corresponds to the monoid macro set

Determine a precise subset of MathLib elements that serves as a "macro set" corresponding to the macro set in the monoid model of human mathematics, providing an explicit mapping that accounts for the observed compression and hierarchical depth in MathLib and aligns with the monoid-based analysis.

Background

The paper models human mathematics (HM) via monoids where named substrings (“macros”) enable compression and defines expansion properties that match observations from MathLib, a large Lean library used as a proxy for HM. While the monoid side has a clearly defined macro set, the authors do not identify an explicit, analogous set within MathLib that plays the same role.

Finding such a set would clarify how definitional and theorem-level abstractions in MathLib produce the measured exponential growth of unwrapped length with depth while keeping wrapped length approximately constant. It would also enable a more direct comparison between theory and data and could guide automated reasoning toward compressible regions.

References

We do not identify a precise ``macro set'' within MathLib (or HM more generally) that maps to the macro set in the monoid, but regard this as a deep open problem, tantamount to locating the owner's manual for HM.

Compression is all you need: Modeling Mathematics  (2603.20396 - Aksenov et al., 20 Mar 2026) in Introduction (Section 1)