Self-Supervised Alignment with Mutual Info

Updated 20 October 2025

SAMI is a framework that uses self-supervised contrastive learning to maximize conditional mutual information, aligning model responses with behavioral principles.
It replaces traditional reinforcement learning by using a symmetric InfoNCE loss and in-batch negatives to enforce clear binding between responses and their governing principles.
SAMI enables scalable, pluralistic model alignment without explicit preference labels, achieving robust performance in dialogue, summarization, and multi-task evaluations.

Self-Supervised Alignment with Mutual Information (SAMI) refers to a set of learning principles and algorithmic strategies that use mutual information maximization to align model predictions or representations with conditioning signals—such as principles, constitutions, or attributes—without reliance on explicit preference labels or reward models. In contemporary LLMs and multimodal systems, SAMI designates a family of iterative, information-theoretic approaches that optimize a contrastive (often InfoNCE) lower bound on the conditional mutual information I(Y; C | X), where Y is the model output, C is a set of behavioral or task principles, and X is the input. Recent developments have established SAMI as a scalable, self-supervised alignment solution for large generative models, especially in applications where human preferences are difficult to collect or generalize across tasks.

1. Foundational Concepts and Objectives

SAMI formalizes model alignment as the maximization of the conditional mutual information between the model’s response and a principle set (or constitution), given the task input:

$I(Y; C \mid X)$

This measure quantifies the dependence of the output Y on the principle C, conditioned on an input X. The operational goal is to ensure that, for a given prompt, generated content not only appears plausible but also encodes and disambiguates the underlying principles it is meant to embody.

The key methodological innovation is to substitute traditional reinforcement learning (which relies on preference labels or reward models) with self-supervised contrastive objectives. By constructing batches of (input, principle, response) triples, and using in-batch negatives (alternative principles or completions), the model is trained to make the response uniquely probable under its generating principle, thus maximizing a variational lower bound on the conditional mutual information.

This paradigm is agnostic to the actual content or structure of the principles, enabling alignment with arbitrary sets of nuanced behaviors, stylistic constraints, or task requirements.

2. Core Algorithm and Mutual Information Objective

The canonical SAMI algorithm is executed in the following iterative loop (Fränken et al., 22 Apr 2024, Govande, 2 Oct 2024, Seneque et al., 13 Oct 2025):

Principle Sampling: For each prompt $x_i$ , sample or synthesize C possible principles $c_j$ . Principles may be generated using a “principle writer” model and can be combined into “constitutions” (i.e., multi-principle sets).
Response Generation: Generate candidate responses $y_{ij}$ for each $(x_i, c_j)$ pair using the current model.
Log Probability Matrix Construction: For all pairs, compute $\log \pi(y_{ij} \mid x_i, c_k)$ for $j, k = 1, \ldots, C$ —i.e., evaluate how likely response $y_{ij}$ is under every alternative principle $c_k$ (held $x_i$ fixed).
Row/Column-Wise Normalization: Construct contrastive normalization across responses (rows) and across principles (columns).
Symmetric InfoNCE Loss: The symmetric InfoNCE loss combines

$L_{\text{row}} = -\frac{1}{N} \sum_{i} \log\frac{e^{S_{ii}}}{\sum_j e^{S_{ij}}}$

and

$L_{\text{col}} = -\frac{1}{N} \sum_{j} \log\frac{e^{S_{jj}}}{\sum_i e^{S_{ij}}}$

into a total loss

$L_{\text{SAMI}} = \lambda_\text{row} L_\text{row} + \lambda_\text{col} L_\text{col}$

where $S_{ij} = \log p_\theta(y_i \mid x_i, c_j)$ and $\lambda_\text{row}, \lambda_\text{col}$ are tunable annealing factors (Seneque et al., 13 Oct 2025).

Finetuning: Update model parameters to minimize $L_{\text{SAMI}}$ for just a few steps to avoid collapse or feature drift.

This two-sided InfoNCE objective is a tractable lower bound on $I(Y; C \mid X)$ , and serves to “bind” completions (Y) to their principles (C), with distractor negatives ensuring sharpening of the association.

Phase	Role	Key Operation
Principle sampling	Generate diverse sets of behavioral/task principles	Model or dataset sampling
Response generation	Obtain completions for input/principle combinations	Model inference
Log-probability matrix	Compute likelihoods of completions under alternative principles	LM forward pass
InfoNCE normalization	Contrast correct pairings with in-batch negatives	Row/column softmax
SAMI loss minimization	Update model to maximize conditional MI	Gradient descent

3. Performance Characteristics and Comparative Analysis

Empirical results across dialogue, summarization, and multi-task evaluations (Fränken et al., 22 Apr 2024, Govande, 2 Oct 2024) demonstrate the following:

Win Rates vs. Baselines: On single-turn dialogue with the HH-RLHF dataset, a SAMI-trained Mistral-7B achieves 66–77% win rates over the base model and surpasses instruct-finetuned models (e.g., Mistral-7B-Instruct) with win rates up to 57%. On summarization (TL;DR), strong models such as Mixtral-8x7B aligned using weakly written constitutions (by Mistral-7B-Instruct) achieve a 65% win rate.
Pluralistic Alignment: Application to multi-task settings (MT-Bench) shows a SAMI win rate of 57% against Direct Preference Optimization (DPO), with notable improvements in principles-sensitive tasks such as math and roleplay (Govande, 2 Oct 2024).
Mathematical Accuracy: SAMI yields small single-attempt accuracy gains (e.g., +1.1% on GSM-8K) compared to SFT (+3.2%), but gains scale with the number of attempts (n = 10: +3.9%), and combining SAMI with SFT yields further improvements (+1.3% multi-attempt).

Unlike SFT and DPO, which demand explicit reference completions or preference pairs, SAMI operates solely on self-generated data and model-derived constitutions.

4. Model Alignment and Principle Generalization

SAMI supports “weak-to-strong” model alignment, where a larger model (e.g., Mixtral-8x7B) can be aligned using constitutions authored by a weaker model (e.g., Mistral-7B-Instruct) (Fränken et al., 22 Apr 2024). The approach achieves robust behavioral adaptation and principle adherence independent of the principle-writer’s intrinsic capability. This separation allows scalable construction of normative guidelines without requiring the most capable models at each stage.

Further, SAMI generalizes to diverse principle sets and domains. Experiments on learned and held-out summarization principles (e.g., “summaries should be scientific”) show win rates up to 68% relative to the pretrained base model, indicating successful transfer of principle-conditional alignment (Fränken et al., 22 Apr 2024).

A related information-geometric extension, ENIGMA, demonstrates that maximizing $I(Y; C \mid X)$ via a symmetric InfoNCE auxiliary induces desirable structural changes in the hidden-state manifold of the model, promoting robust, principle-bound Chain-of-Thought reasoning without reward modeling (Seneque et al., 13 Oct 2025).

5. Unified Information-Theoretic Perspective

SAMI’s objective is unified with recent advances in information geometry and mutual information bounds. The variational InfoNCE construction provides a practical lower bound on the intractable mutual information, and the symmetric (row+column) loss formulation ensures that both response specificity and principle distinguishability are optimized.

In ENIGMA (Seneque et al., 13 Oct 2025), a Sufficiency Index (SI) further quantifies how well a given principle set improves token-level likelihoods, increases InfoNCE bounds, and separates principle classes, serving as a selection tool:

$\text{SI} = w_b \cdot z_{\text{bits}} + w_m \cdot z_{\text{MI}} + w_s \cdot z_{\text{sep}}$

Model runs with high SI principle sets show improved training stability and performance on QA and truthfulness benchmarks, affirming that not just InfoNCE magnitude, but principle content, is critical to effective SAMI-based alignment.

6. Practical Implications and Future Directions

The main advantages of SAMI include:

No reliance on preference labels or reward models: Alignment is achieved via mutual information optimization using only self-generated data.
Low overhead, fast adaptation: Iterative fine-tuning is computationally lightweight, often involving very few gradient steps.
Pluralistic and scalable steering: The approach accommodates arbitrary, even multi-principle, constitutions and supports large-scale or multi-task adaptation.
Principled diagnostics: Mutual information bounds and sufficiency indices provide quantitative metrics for alignment quality and principle set selection.

This suggests that future work will focus on:

Improved principle synthesis and selection (via SI maximization, weighting, or discovery)
Extension to domains beyond LMs (e.g., T2I models, multimodal generative systems)
Synergistic combination with SFT, RLHF, and entropy-based regularizers for robust and interpretable alignment
Enhanced scaling strategies and low-resource adaptation techniques

A plausible implication is that SAMI could enable modular, auditable behavioral control in foundation models—prompting next-generation research in constitutional AI, robust policy alignment, and information-theoretic diagnostics for trust and safety.

7. Summary Table: Key Components of SAMI-Based Alignment

Principle	Implementation Feature	Experimentally Supported Outcomes
Conditional MI	Symmetric InfoNCE loss	Improved “win rate” alignment benchmarks
Principle diversity	Batch-sampled constitutions	Generalization to held-out principles
Self-supervision	No preference labels required	Reduced annotation cost
Model-based writing	Weak-to-strong alignment	Scalability, independence from labeler
SI diagnostics	Sufficiency Index measurement	Predicts improved training and alignment

SAMI thus offers a theoretically principled, empirically supported, and computationally practical route for robust self-supervised alignment of large models with nuanced normative principles, without dependence on external preference labels or reward modeling infrastructure.

PDF Markdown Chat (Pro)

References (3)

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels (2024)

An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings (2024)

ENIGMA: The Geometry of Reasoning and Alignment in Large-Language Models (2025)

Follow Topic

Get notified by email when new papers are published related to Self-Supervised Alignment with Mutual Information (SAMI).