Papers
Topics
Authors
Recent
Search
2000 character limit reached

CPMöbius: Invariant Frameworks in RL & Geometry

Updated 10 February 2026
  • CPMöbius is a suite of invariant frameworks integrating data-free reinforcement learning, conformal geometry operators, and computational libraries.
  • Its reinforcement learning component features a unique Coach-Player paradigm that employs iterative curriculum adaptation and GRPO to enhance reasoning capabilities.
  • The computational tools include conformal-to-Einstein operators and C++ libraries for Möbius-invariant manipulations, enabling advanced visualizations in non-Euclidean spaces.

CPMöbius (CPMobius) refers to several advanced mathematical, algorithmic, and computational frameworks all centered upon Möbius invariance, appearing in modern reinforcement learning for reasoning, conformal geometry, and computational geometry toolkits. The term encompasses (1) a state-of-the-art data-free reinforcement learning protocol for LLMs, (2) a geometric operator on Möbius surfaces in conformal geometry, and (3) a prominent software system for Möbius-invariant manipulations in non-Euclidean geometry.

1. Data-Free Cooperative RL for Reasoning: The CPMöbius Algorithm

CPMöbius introduces a collaborative Coach-Player paradigm designed for data-free reinforcement learning of reasoning models—specifically LLMs trained without reliance on external human-labeled data (Li et al., 3 Feb 2026). Unlike traditional adversarial self-play, CPMöbius employs two independently optimizing, cooperative agents: a Coach, acting as an adaptive curriculum designer, and a Player, the primary reasoning policy.

Coach-Player Iterative Loop

  • Coach Policy (πC\pi_C): Samples instructive tasks tailored to current Player competence and filters instructions via an on-the-fly difficulty check, retaining only those that induce Player roll-out accuracy within [0.2,0.8][0.2,\,0.8].
  • Player Policy (πP\pi_P): Solves the given tasks, generating nn independent answers per instruction. Pseudo-labels are set by majority vote, and the Player is rewarded for agreement with these pseudo-labels.
  • Reward Structure: The Player is rewarded per instance, while the Coach is only rewarded if tasks lead to both high immediate Player accuracy and actual improvement as measured by a held-out validation set. The Coach update thus couples local task utility to global learning progress.
  • Optimization: The Player is trained via Group Relative Policy Optimization (GRPO), whereas the Coach uses REINFORCE, enabling both agents to learn from interleaved, data-free curriculum progression.

RL Formulation

For Player: Ai,j=ri,jmean({ri,})std({ri,})A_{i,j} = \frac{r_{i,j} - \mathrm{mean}(\{ r_{i,\ell} \})}{\mathrm{std}(\{ r_{i,\ell} \})} with the GRPO loss: LGRPO(πP)=Exi,{yi,j}[1nj=1nmin(ri,jAi,j, clip(ri,j,1ϵ,1+ϵ)Ai,j)]βDKL(πPπref)\mathcal{L}_{\mathrm{GRPO}}(\pi_P) = \mathbb{E}_{x_i,\,\{y_{i,j}\}} \left[ \frac{1}{n} \sum_{j=1}^n \min(r_{i,j}A_{i,j},\ \mathrm{clip}(r_{i,j},1-\epsilon,1+\epsilon)A_{i,j}) \right] - \beta D_{\mathrm{KL}}(\pi_P\|\pi_{\mathrm{ref}})

For Coach: RC(xi)=RP(xi)×AtR_C(x_i) = R_P(x_i) \times A_t where RP(xi)R_P(x_i) is the Player’s average reward and AtA_t is the validation set accuracy improvement.

Curriculum Adaptation

By explicitly filtering task difficulties into a zone of proximal development, CPMöbius avoids trivial or impossibly hard tasks, promoting stable curriculum ramping and continuous Player improvement.

Empirical Results

Testing on six mathematical reasoning benchmarks (AMC 2023, AIME 2024/2025, Minerva, MATH-500, Olympiad-Bench), CPMöbius achieves on Qwen2.5-Math-7B-Instruct:

Model Variant Overall Accuracy (%) OOD Accuracy (%)
Base model 35.8 37.4
+ R-Zero 36.9 38.1
+ RENT 39.2 38.8
+ CPMöbius 40.7 38.3

This demonstrates gains of +4.9 overall and +5.4 out-of-distribution over base, outperforming entropy-minimization (RENT) and adversarial self-play (R-Zero).

Ablation Analysis

  • With Coach frozen: −3.5% accuracy
  • No Coach warm-up: performance near base
  • No difficulty filtering: notable regression

Thus, online Coach adaptation and filtered curriculum are critical (Li et al., 3 Feb 2026).

2. CPMöbius Operator in Conformal Geometry

Originally, CPMöbius denotes the conformal-to-Einstein operator on two-dimensional Möbius surfaces, classifying metrics conformally related to Einstein geometries (Randall, 2013).

Möbius Surfaces and Tractor Bundles

Given a Riemann surface (M2,[g])(M^2, [g]), a Möbius structure is an additional symmetric tensor PabP_{ab} (Rho tensor) which, alongside the conformal class, encodes second-order geometric invariants under rescaling. The tractor bundle EAE[1]Ea[1]E[1]\mathcal{E}^A \cong E[1] \oplus E_a[1] \oplus E[-1] provides a natural setting for invariant differential operators.

The Conformal-to-Einstein Equation

The CPMöbius operator is the trace-free Hessian equation: ((ab)σ+Pabσ)0=0\bigl( \nabla_{(a} \nabla_{b)} \sigma + P_{ab}\, \sigma \bigr)_0 = 0 for a nonvanishing section σ\sigma (scale). Its prolongation system introduces a Cotton-York curvature term unique to dimension 2: aσ=Ha aHb=PabσgabA aA=Yaσ+PabHb\begin{aligned} \nabla_a\sigma &= H_a \ \nabla_a H_b &= -P_{ab}\sigma - g_{ab}A \ \nabla_a A &= -Y_a \sigma + P_a{}^b H_b \end{aligned} where YabcY_{abc} is the Cotton-York tensor, yielding nontrivial local obstructions to existence of solutions.

Tractor Characterization

Parallel sections of the prolongation tractor connection correspond bijectively to solutions of the CPMöbius PDE, with the key prescription: aprol(σ Hb A)=0\nabla^{\mathrm{prol}}_a \begin{pmatrix} \sigma \ H_b \ A \end{pmatrix} = 0

Local Obstructions and Kernel Dimension

Using differentiation and algebraic constraints, all possible local solution spaces (kernel of the operator) are classified:

Geometric Condition Kernel Dimension
Möbius-flat (Cotton-York zero) 4
Non-flat, generic 1
Non-flat, non-generic, ODE case 2

All other scenarios lead to obstruction and lack of solutions.

3. CPMöbius Computational Libraries in Geometry

MoebInv implements CPMöbius as a suite of C++ libraries for analytic, symbolic, and graphical manipulations in Möbius-invariant non-Euclidean geometry (Kisil, 2019).

Library Architecture

  • cycle: Models a single cycle—the locus of a quadratic equation in arbitrary signature—including operations such as Möbius transforms, inner products, and normalization.
  • figure: Manages graphs of cycles linked by Möbius-invariant relations—orthogonality, tangency, incidence, passage through infinity—supports hierarchical constraint solving and relation checking.

Key classes include Cycle, CycleSpace, Figure, and Relation. Python and Jupyter bindings expose these capabilities interactively.

Mathematical Foundation

A cycle in Rp,q\mathbb{R}^{p,q} is

kx,xp,q+2,xp,q+m=0k \cdot \langle x, x\rangle_{p,q} + 2\langle \ell, x \rangle_{p,q} + m = 0

with x,xp,q\langle x, x \rangle_{p,q} the signatured quadratic form. The Fillmore–Springer–Cnops construction embeds cycles as 2×22\times 2 Clifford matrices, permitting Möbius actions as matrix conjugations.

The inner product

C1,C2=tr(C1JC2J)\langle C_1, C_2 \rangle = -\mathrm{tr}(C_1 J C_2 J)

unifies geometry: orthogonality, tangency, and incidence.

Code and Workflow

Both C++ and Python workflows are supported. Sample usage includes:

  • Defining unit circles and lines via CycleSpace.
  • Attaching cycles and investigating relations via Figure.
  • Solving symbolic constraints or numeric relations, visually manipulating cycle ensembles.
  • Exporting Asymptote and SVG diagrams for publication.

GUI and Application

A dedicated Qt5 GUI enables visual construction and manipulation of cycle configurations. This enables streamlined experimentation and illustration of Möbius-invariant configurations, such as Apollonian circle packings.

Performance is optimized for 2D/3D work with both symbolic and numeric backends.

4. Comparative Analysis and Significance

The CPMöbius algorithm in reinforcement learning provides the first fully data-free, cooperative RL paradigm for LLMs, enabling measurable advances in mathematical reasoning capabilities without external data (Li et al., 3 Feb 2026). Its cooperative (Coach-Player) optimization is distinct from adversarial self-play, yielding more stable curricula, efficient skill acquisition, and better generalization.

The conformal-to-Einstein operator (CPMöbius) in geometry highlights subtle features unique to two-dimensional Möbius geometry, such as explicit Cotton-York obstructions that have no higher-dimensional analogue (Randall, 2013). This advances the classification of conformal geometries with Einstein representatives.

Computationally, CPMöbius (MoebInv) delivers a practical toolkit for researchers exploring the full algebraic and geometric structure of cycles, enabling high-level manipulations and visualization in arbitrary signature spaces (Kisil, 2019).

5. Limitations and Future Directions

For data-free RL, CPMöbius is currently limited to mathematical reasoning domains; generalization to multi-modal or open-ended task spaces is an open problem. The dependence on a warmed-up Coach motivates further work on unsupervised Coach bootstrapping and long-term loop stability as Player competence scales.

In differential geometry, CPMöbius obstructions are fully understood only for analytic Möbius surfaces with specified curvature; further study may broaden the classification.

The computational libraries are primarily optimized for low-dimensional, non-degenerate signatures; extension to high-dimensional or degenerate cases requires further symbolic optimization.

6. References

  • CPMöbius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning (Li et al., 3 Feb 2026)
  • The conformal-to-Einstein equation on Möbius surfaces (Randall, 2013)
  • MoebInv: C++ libraries for manipulations in non-Euclidean geometry (Kisil, 2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CPMöbius (CPMobius).