Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 118 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 429 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Self-Supervised Alignment with Mutual Info

Updated 20 October 2025
  • SAMI is a framework that uses self-supervised contrastive learning to maximize conditional mutual information, aligning model responses with behavioral principles.
  • It replaces traditional reinforcement learning by using a symmetric InfoNCE loss and in-batch negatives to enforce clear binding between responses and their governing principles.
  • SAMI enables scalable, pluralistic model alignment without explicit preference labels, achieving robust performance in dialogue, summarization, and multi-task evaluations.

Self-Supervised Alignment with Mutual Information (SAMI) refers to a set of learning principles and algorithmic strategies that use mutual information maximization to align model predictions or representations with conditioning signals—such as principles, constitutions, or attributes—without reliance on explicit preference labels or reward models. In contemporary LLMs and multimodal systems, SAMI designates a family of iterative, information-theoretic approaches that optimize a contrastive (often InfoNCE) lower bound on the conditional mutual information I(Y; C | X), where Y is the model output, C is a set of behavioral or task principles, and X is the input. Recent developments have established SAMI as a scalable, self-supervised alignment solution for large generative models, especially in applications where human preferences are difficult to collect or generalize across tasks.

1. Foundational Concepts and Objectives

SAMI formalizes model alignment as the maximization of the conditional mutual information between the model’s response and a principle set (or constitution), given the task input:

I(Y;CX)I(Y; C \mid X)

This measure quantifies the dependence of the output Y on the principle C, conditioned on an input X. The operational goal is to ensure that, for a given prompt, generated content not only appears plausible but also encodes and disambiguates the underlying principles it is meant to embody.

The key methodological innovation is to substitute traditional reinforcement learning (which relies on preference labels or reward models) with self-supervised contrastive objectives. By constructing batches of (input, principle, response) triples, and using in-batch negatives (alternative principles or completions), the model is trained to make the response uniquely probable under its generating principle, thus maximizing a variational lower bound on the conditional mutual information.

This paradigm is agnostic to the actual content or structure of the principles, enabling alignment with arbitrary sets of nuanced behaviors, stylistic constraints, or task requirements.

2. Core Algorithm and Mutual Information Objective

The canonical SAMI algorithm is executed in the following iterative loop (Fränken et al., 22 Apr 2024, Govande, 2 Oct 2024, Seneque et al., 13 Oct 2025):

  1. Principle Sampling: For each prompt xix_i, sample or synthesize C possible principles cjc_j. Principles may be generated using a “principle writer” model and can be combined into “constitutions” (i.e., multi-principle sets).
  2. Response Generation: Generate candidate responses yijy_{ij} for each (xi,cj)(x_i, c_j) pair using the current model.
  3. Log Probability Matrix Construction: For all pairs, compute logπ(yijxi,ck)\log \pi(y_{ij} \mid x_i, c_k) for j,k=1,,Cj, k = 1, \ldots, C—i.e., evaluate how likely response yijy_{ij} is under every alternative principle ckc_k (held xix_i fixed).
  4. Row/Column-Wise Normalization: Construct contrastive normalization across responses (rows) and across principles (columns).
  5. Symmetric InfoNCE Loss: The symmetric InfoNCE loss combines

Lrow=1NilogeSiijeSijL_{\text{row}} = -\frac{1}{N} \sum_{i} \log\frac{e^{S_{ii}}}{\sum_j e^{S_{ij}}}

and

Lcol=1NjlogeSjjieSijL_{\text{col}} = -\frac{1}{N} \sum_{j} \log\frac{e^{S_{jj}}}{\sum_i e^{S_{ij}}}

into a total loss

LSAMI=λrowLrow+λcolLcolL_{\text{SAMI}} = \lambda_\text{row} L_\text{row} + \lambda_\text{col} L_\text{col}

where Sij=logpθ(yixi,cj)S_{ij} = \log p_\theta(y_i \mid x_i, c_j) and λrow,λcol\lambda_\text{row}, \lambda_\text{col} are tunable annealing factors (Seneque et al., 13 Oct 2025).

  1. Finetuning: Update model parameters to minimize LSAMIL_{\text{SAMI}} for just a few steps to avoid collapse or feature drift.

This two-sided InfoNCE objective is a tractable lower bound on I(Y;CX)I(Y; C \mid X), and serves to “bind” completions (Y) to their principles (C), with distractor negatives ensuring sharpening of the association.

Phase Role Key Operation
Principle sampling Generate diverse sets of behavioral/task principles Model or dataset sampling
Response generation Obtain completions for input/principle combinations Model inference
Log-probability matrix Compute likelihoods of completions under alternative principles LM forward pass
InfoNCE normalization Contrast correct pairings with in-batch negatives Row/column softmax
SAMI loss minimization Update model to maximize conditional MI Gradient descent

3. Performance Characteristics and Comparative Analysis

Empirical results across dialogue, summarization, and multi-task evaluations (Fränken et al., 22 Apr 2024, Govande, 2 Oct 2024) demonstrate the following:

  • Win Rates vs. Baselines: On single-turn dialogue with the HH-RLHF dataset, a SAMI-trained Mistral-7B achieves 66–77% win rates over the base model and surpasses instruct-finetuned models (e.g., Mistral-7B-Instruct) with win rates up to 57%. On summarization (TL;DR), strong models such as Mixtral-8x7B aligned using weakly written constitutions (by Mistral-7B-Instruct) achieve a 65% win rate.
  • Pluralistic Alignment: Application to multi-task settings (MT-Bench) shows a SAMI win rate of 57% against Direct Preference Optimization (DPO), with notable improvements in principles-sensitive tasks such as math and roleplay (Govande, 2 Oct 2024).
  • Mathematical Accuracy: SAMI yields small single-attempt accuracy gains (e.g., +1.1% on GSM-8K) compared to SFT (+3.2%), but gains scale with the number of attempts (n = 10: +3.9%), and combining SAMI with SFT yields further improvements (+1.3% multi-attempt).

Unlike SFT and DPO, which demand explicit reference completions or preference pairs, SAMI operates solely on self-generated data and model-derived constitutions.

4. Model Alignment and Principle Generalization

SAMI supports “weak-to-strong” model alignment, where a larger model (e.g., Mixtral-8x7B) can be aligned using constitutions authored by a weaker model (e.g., Mistral-7B-Instruct) (Fränken et al., 22 Apr 2024). The approach achieves robust behavioral adaptation and principle adherence independent of the principle-writer’s intrinsic capability. This separation allows scalable construction of normative guidelines without requiring the most capable models at each stage.

Further, SAMI generalizes to diverse principle sets and domains. Experiments on learned and held-out summarization principles (e.g., “summaries should be scientific”) show win rates up to 68% relative to the pretrained base model, indicating successful transfer of principle-conditional alignment (Fränken et al., 22 Apr 2024).

A related information-geometric extension, ENIGMA, demonstrates that maximizing I(Y;CX)I(Y; C \mid X) via a symmetric InfoNCE auxiliary induces desirable structural changes in the hidden-state manifold of the model, promoting robust, principle-bound Chain-of-Thought reasoning without reward modeling (Seneque et al., 13 Oct 2025).

5. Unified Information-Theoretic Perspective

SAMI’s objective is unified with recent advances in information geometry and mutual information bounds. The variational InfoNCE construction provides a practical lower bound on the intractable mutual information, and the symmetric (row+column) loss formulation ensures that both response specificity and principle distinguishability are optimized.

In ENIGMA (Seneque et al., 13 Oct 2025), a Sufficiency Index (SI) further quantifies how well a given principle set improves token-level likelihoods, increases InfoNCE bounds, and separates principle classes, serving as a selection tool:

SI=wbzbits+wmzMI+wszsep\text{SI} = w_b \cdot z_{\text{bits}} + w_m \cdot z_{\text{MI}} + w_s \cdot z_{\text{sep}}

Model runs with high SI principle sets show improved training stability and performance on QA and truthfulness benchmarks, affirming that not just InfoNCE magnitude, but principle content, is critical to effective SAMI-based alignment.

6. Practical Implications and Future Directions

The main advantages of SAMI include:

  • No reliance on preference labels or reward models: Alignment is achieved via mutual information optimization using only self-generated data.
  • Low overhead, fast adaptation: Iterative fine-tuning is computationally lightweight, often involving very few gradient steps.
  • Pluralistic and scalable steering: The approach accommodates arbitrary, even multi-principle, constitutions and supports large-scale or multi-task adaptation.
  • Principled diagnostics: Mutual information bounds and sufficiency indices provide quantitative metrics for alignment quality and principle set selection.

This suggests that future work will focus on:

  • Improved principle synthesis and selection (via SI maximization, weighting, or discovery)
  • Extension to domains beyond LMs (e.g., T2I models, multimodal generative systems)
  • Synergistic combination with SFT, RLHF, and entropy-based regularizers for robust and interpretable alignment
  • Enhanced scaling strategies and low-resource adaptation techniques

A plausible implication is that SAMI could enable modular, auditable behavioral control in foundation models—prompting next-generation research in constitutional AI, robust policy alignment, and information-theoretic diagnostics for trust and safety.

7. Summary Table: Key Components of SAMI-Based Alignment

Principle Implementation Feature Experimentally Supported Outcomes
Conditional MI Symmetric InfoNCE loss Improved “win rate” alignment benchmarks
Principle diversity Batch-sampled constitutions Generalization to held-out principles
Self-supervision No preference labels required Reduced annotation cost
Model-based writing Weak-to-strong alignment Scalability, independence from labeler
SI diagnostics Sufficiency Index measurement Predicts improved training and alignment

SAMI thus offers a theoretically principled, empirically supported, and computationally practical route for robust self-supervised alignment of large models with nuanced normative principles, without dependence on external preference labels or reward modeling infrastructure.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Self-Supervised Alignment with Mutual Information (SAMI).