ENIGMA: Entropic Mutual-Info LLM Alignment
- ENIGMA introduces a unified framework that projects explicit organisational principles onto the model's internal information manifold for robust LLM alignment.
- It integrates group-relative policy optimisation, mutual-information self-supervision, and Sinkhorn optimal transport regularisation to enhance reasoning and ensure geometrically smooth model updates.
- Empirical evaluations demonstrate improved stability and benchmark accuracy by aligning hidden representations with high-SI principles through measurable information-theoretic metrics.
Entropic Mutual-Information Geometry Large-LLM Alignment (ENIGMA) is a unified approach for aligning large-LLMs by treating organisational principles or policies as explicit directions on the internal information manifold of a neural network. ENIGMA frames alignment, reasoning, and robustness as projections of a single information-geometric objective and implements this by combining advanced policy optimisation, mutual information-based self-supervision, and manifold regularisation. The method is designed to induce principled reasoning—measurable by information-theoretic metrics—without relying on reward models or offline preference datasets, thereby addressing key challenges in LLM alignment regarding transparency, robustness, and generalisation (Seneque et al., 13 Oct 2025).
1. Information-Geometric Foundations
ENIGMA builds upon the geometry of information encoded in the hidden space of LLMs by leveraging the Fisher–Rao metric and optimal transport theory. Rather than viewing principles or policies as external constraints, ENIGMA embeds these as “directions to move” in the information manifold, which is defined by the geometry of the model’s hidden-state probability distributions. The alignment process is thus conceptualised as movement along information-theoretically motivated paths, bounded in terms of divergence and transport cost, to ensure both local consistency and global robustness.
Mathematically, the framework monitors the evolution of the model’s hidden state and output distributions using quantities such as Jensen–Shannon divergence, Bhattacharyya angle, Fréchet distance, and participation ratio, linking geometric change to the satisfaction of alignment principles and the robustness of reasoning processes.
2. Core Training Architecture
ENIGMA employs a single-loop training paradigm integrating three main components:
- Group-Relative Policy Optimisation (GRPO): An on-policy, critic-free RL method that operates over groups of completions, computing local advantage signals and enforcing trust region constraints on the Fisher–Rao manifold of the model’s token distributions.
- Self-Supervised Alignment with Mutual Information (SAMI): A symmetric InfoNCE auxiliary loss that maximises the conditional mutual information between generated chain-of-thought completions and the encoded organisational principle. SAMI uses both row- and column-InfoNCE to align completions with correct principles and vice versa.
- Entropic Sinkhorn Optimal Transport (OT) Regulariser: A divergence penalty applied to hidden-state distributions that bounds geometry drift by comparing the current policy’s hidden states against a reference snapshot, using the Sinkhorn divergence for smoothed and unbiased transport cost estimation.
The composite objective is expressed as:
where and are hyperparameters weighting the InfoNCE auxiliary and OT regulariser.
3. Self-Supervised Mutual Information Alignment
To directly bind the model’s reasoning to explicit principles, ENIGMA uses a SAMI-style InfoNCE objective that lower bounds , the mutual information between the completion and principle given prompt . The method computes, for each completion-principle pair, a contrastive loss over batches:
- Row InfoNCE: For each completion, the score under the matching principle must be higher than for shadow principles.
- Column InfoNCE: For each principle, the score for the matching completion must be higher than for distractor completions.
Let denote the log-likelihood of completion conditioned on principle . The row InfoNCE is:
analogously for the column direction. These metrics provide falsifiable, quantitative lower bounds on the extent to which model reasoning encodes the target principle.
4. Geometric Regularisation via Sinkhorn Optimal Transport
The entropic Sinkhorn regulariser constrains the geometry of hidden-state distributions, preventing abrupt manifold drift which can lead to overfitting or misalignment. Sinkhorn divergence between empirical hidden-state distributions is calculated as:
where denotes the entropic optimal transport cost. This ensures the model’s geometric moves are smooth, bounding both robustness and alignment error in the high-dimensional state space.
5. Metrics and Principle Selection
A key aspect of ENIGMA is pre-selecting organisational principles based on their Sufficiency Index (SI), a composite score aggregating predictive information (e.g., when conditioning on the principle), InfoNCE lower bound bits, and separation measures like AUC. High-SI principles (as measured before training) are empirically shown to yield steadier training dynamics and improved downstream performance. SI is calculated by aggregating z-scored versions of the component metrics.
During training and evaluation, “clean” InfoNCE diagnostics measure mutual information lower bounds under matched negatives, providing transparent, falsifiable signals for principle encoding. These include:
- Clean row bound: , where is the number of negatives.
6. Experimental Outcomes and Information Manifold Analysis
Empirical evaluations on small LLMs (e.g., 1B parameters) using chain-of-thought benchmarks demonstrate the effectiveness of ENIGMA:
- Models aligned with high-SI principles exhibit lower training variance and improved accuracy on benchmarks like GPQA and TruthfulQA compared to GRPO ablations lacking mutual information alignment.
- Ablation studies reveal that MI-driven alignment, in combination with format-only rewards, results in more robust encoding of principles, as opposed to superficial formatting compliance.
Information-geometric analysis of internal representations tracks the evolution of manifold structures through metrics such as Bhattacharyya angle, Jensen–Shannon divergence, Fréchet distance, and rank measures, verifying that desired structural manifold changes occur during principled alignment.
7. Impact and Broader Implications
ENIGMA establishes reasoning, alignment, and robustness as entropic-geometric projections. By jointly optimising policy, MI-based principle encoding, and manifold regularity:
- It eliminates the need for external reward models, instead grounding alignment directly in model internals.
- It enables explicit, quantitative monitoring of principle encoding using MI bounds and the SI metric.
- It ensures the entire reasoning trace, not merely the output token, is shaped towards desired constitutional principles.
This suggests a plausible direction for advancing trusted, interpretable LLM capability: ENIGMA’s information-geometric perspective provides a rigorous, falsifiable, and actionable framework for principled reasoning, robust alignment, and controlled manifold evolution in large-LLM design.