Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

ENIGMA: Entropic Mutual-Info LLM Alignment

Updated 20 October 2025
  • ENIGMA introduces a unified framework that projects explicit organisational principles onto the model's internal information manifold for robust LLM alignment.
  • It integrates group-relative policy optimisation, mutual-information self-supervision, and Sinkhorn optimal transport regularisation to enhance reasoning and ensure geometrically smooth model updates.
  • Empirical evaluations demonstrate improved stability and benchmark accuracy by aligning hidden representations with high-SI principles through measurable information-theoretic metrics.

Entropic Mutual-Information Geometry Large-LLM Alignment (ENIGMA) is a unified approach for aligning large-LLMs by treating organisational principles or policies as explicit directions on the internal information manifold of a neural network. ENIGMA frames alignment, reasoning, and robustness as projections of a single information-geometric objective and implements this by combining advanced policy optimisation, mutual information-based self-supervision, and manifold regularisation. The method is designed to induce principled reasoning—measurable by information-theoretic metrics—without relying on reward models or offline preference datasets, thereby addressing key challenges in LLM alignment regarding transparency, robustness, and generalisation (Seneque et al., 13 Oct 2025).

1. Information-Geometric Foundations

ENIGMA builds upon the geometry of information encoded in the hidden space of LLMs by leveraging the Fisher–Rao metric and optimal transport theory. Rather than viewing principles or policies as external constraints, ENIGMA embeds these as “directions to move” in the information manifold, which is defined by the geometry of the model’s hidden-state probability distributions. The alignment process is thus conceptualised as movement along information-theoretically motivated paths, bounded in terms of divergence and transport cost, to ensure both local consistency and global robustness.

Mathematically, the framework monitors the evolution of the model’s hidden state and output distributions using quantities such as Jensen–Shannon divergence, Bhattacharyya angle, Fréchet distance, and participation ratio, linking geometric change to the satisfaction of alignment principles and the robustness of reasoning processes.

2. Core Training Architecture

ENIGMA employs a single-loop training paradigm integrating three main components:

  • Group-Relative Policy Optimisation (GRPO): An on-policy, critic-free RL method that operates over groups of completions, computing local advantage signals and enforcing trust region constraints on the Fisher–Rao manifold of the model’s token distributions.
  • Self-Supervised Alignment with Mutual Information (SAMI): A symmetric InfoNCE auxiliary loss that maximises the conditional mutual information between generated chain-of-thought completions and the encoded organisational principle. SAMI uses both row- and column-InfoNCE to align completions with correct principles and vice versa.
  • Entropic Sinkhorn Optimal Transport (OT) Regulariser: A divergence penalty applied to hidden-state distributions that bounds geometry drift by comparing the current policy’s hidden states against a reference snapshot, using the Sinkhorn divergence for smoothed and unbiased transport cost estimation.

The composite objective is expressed as:

LENIGMA=LGRPO+λSAMILSAMI+λOTROT,L_{\mathrm{ENIGMA}} = L_{\mathrm{GRPO}} + \lambda_{\mathrm{SAMI}} L_{\mathrm{SAMI}} + \lambda_{\mathrm{OT}} R_{\mathrm{OT}},

where λSAMI\lambda_{\mathrm{SAMI}} and λOT\lambda_{\mathrm{OT}} are hyperparameters weighting the InfoNCE auxiliary and OT regulariser.

3. Self-Supervised Mutual Information Alignment

To directly bind the model’s reasoning to explicit principles, ENIGMA uses a SAMI-style InfoNCE objective that lower bounds I(Y;CX)I(Y; C \,|\, X), the mutual information between the completion YY and principle CC given prompt XX. The method computes, for each completion-principle pair, a contrastive loss over batches:

  • Row InfoNCE: For each completion, the score under the matching principle must be higher than for shadow principles.
  • Column InfoNCE: For each principle, the score for the matching completion must be higher than for distractor completions.

Let LijL_{ij} denote the log-likelihood of completion yiy_i conditioned on principle cjc_j. The row InfoNCE is:

Lrow=1Nilog(eLiijeLij),L_{\mathrm{row}} = -\frac{1}{N} \sum_{i} \log \left( \frac{e^{L_{ii}}}{\sum_j e^{L_{ij}}} \right),

analogously for the column direction. These metrics provide falsifiable, quantitative lower bounds on the extent to which model reasoning encodes the target principle.

4. Geometric Regularisation via Sinkhorn Optimal Transport

The entropic Sinkhorn regulariser constrains the geometry of hidden-state distributions, preventing abrupt manifold drift which can lead to overfitting or misalignment. Sinkhorn divergence Sϵ(α,β)S_\epsilon(\alpha, \beta) between empirical hidden-state distributions is calculated as:

Sϵ(α,β)=OTϵ(α,β)12OTϵ(α,α)12OTϵ(β,β),S_\epsilon(\alpha, \beta) = \mathrm{OT}_\epsilon(\alpha, \beta) - \frac{1}{2} \mathrm{OT}_\epsilon(\alpha, \alpha) - \frac{1}{2} \mathrm{OT}_\epsilon(\beta, \beta),

where OTϵ\mathrm{OT}_\epsilon denotes the entropic optimal transport cost. This ensures the model’s geometric moves are smooth, bounding both robustness and alignment error in the high-dimensional state space.

5. Metrics and Principle Selection

A key aspect of ENIGMA is pre-selecting organisational principles based on their Sufficiency Index (SI), a composite score aggregating predictive information (e.g., ΔNLL\Delta \mathrm{NLL} when conditioning on the principle), InfoNCE lower bound bits, and separation measures like AUC. High-SI principles (as measured before training) are empirically shown to yield steadier training dynamics and improved downstream performance. SI is calculated by aggregating z-scored versions of the component metrics.

During training and evaluation, “clean” InfoNCE diagnostics measure mutual information lower bounds under matched negatives, providing transparent, falsifiable signals for principle encoding. These include:

  • Clean row bound: Iˉrowclean=log(K+1)Lrowclean\bar{I}_{\mathrm{row}}^{\mathrm{clean}} = \log(K+1) - L_{\mathrm{row}}^{\mathrm{clean}}, where KK is the number of negatives.

6. Experimental Outcomes and Information Manifold Analysis

Empirical evaluations on small LLMs (e.g., 1B parameters) using chain-of-thought benchmarks demonstrate the effectiveness of ENIGMA:

  • Models aligned with high-SI principles exhibit lower training variance and improved accuracy on benchmarks like GPQA and TruthfulQA compared to GRPO ablations lacking mutual information alignment.
  • Ablation studies reveal that MI-driven alignment, in combination with format-only rewards, results in more robust encoding of principles, as opposed to superficial formatting compliance.

Information-geometric analysis of internal representations tracks the evolution of manifold structures through metrics such as Bhattacharyya angle, Jensen–Shannon divergence, Fréchet distance, and rank measures, verifying that desired structural manifold changes occur during principled alignment.

7. Impact and Broader Implications

ENIGMA establishes reasoning, alignment, and robustness as entropic-geometric projections. By jointly optimising policy, MI-based principle encoding, and manifold regularity:

  • It eliminates the need for external reward models, instead grounding alignment directly in model internals.
  • It enables explicit, quantitative monitoring of principle encoding using MI bounds and the SI metric.
  • It ensures the entire reasoning trace, not merely the output token, is shaped towards desired constitutional principles.

This suggests a plausible direction for advancing trusted, interpretable LLM capability: ENIGMA’s information-geometric perspective provides a rigorous, falsifiable, and actionable framework for principled reasoning, robust alignment, and controlled manifold evolution in large-LLM design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Entropic Mutual-Information Geometry Large-Language Model Alignment (ENIGMA).