Preference Mode Collapse in Generative Models

Updated 6 January 2026

Preference Mode Collapse is characterized by a model’s output concentrating on a narrow set of modes that maximize reward, sacrificing diversity.
It is quantified using metrics such as support-size collapse, entropy drop, and spatial dispersion, highlighting its impact in both diffusion models and LLMs.
Mitigation strategies like D²-Align for diffusion models and Verbalized Sampling for LLMs show promise in restoring diversity while maintaining alignment quality.

Preference Mode Collapse (PMC) refers to a systemic failure mode in preference-driven training or reinforcement learning from human feedback (RLHF), in which an aligned generative model—such as a diffusion model or a LLM—collapses its output distribution onto a narrow, reward-exploiting subset of modes. This phenomenon severely degrades the diversity of generated samples, producing outputs that optimize alignment metrics but fail to capture the full spectrum of human preferences or creative variety (Chen et al., 30 Dec 2025, Zhang et al., 1 Oct 2025).

1. Formal Definition and Manifestation

PMC is characterized by the concentration of a model’s conditional output law $p_\theta(x|c)$ (for a generative model $G_\theta$ and condition $c$ ) onto a “high-score” manifold $M \subset X$ , where $R(x, c)$ , a scalar reward function or learned preference model, is maximized. Formally,

$\text{KL}(p_\theta(x|c) \parallel \text{Unif}(M)) \to 0,$

even as $M$ fails to capture the true diversity or plurality of acceptable outputs per human judgment. In text-to-image diffusion, this can manifest as all “Cubism” outputs being trivially overexposed or as the generator learning a monolithic “glossy, highly-lit” style favored by the reward model’s biases, despite human preference for varied artistic interpretations (Chen et al., 30 Dec 2025).

Analogously, in LLMs, preference mode collapse is evidenced when the aligned model $\pi_\theta(y|x)$ assigns most probability mass to a small subset of completions, reducing entropy and the support size of generations. Metrics include:

Support-size collapse: $|\{y : \pi_\theta(y|x) > \tau\}|$ for small $\tau$ .
Entropy collapse: $H(\pi_\theta(\cdot|x)) = -\sum_{y} \pi_\theta(y|x)\log\pi_\theta(y|x)$ (Zhang et al., 1 Oct 2025).

2. Benchmarking and Quantification

PMC is systematically measured using dimensional diversity benchmarks. In diffusion, DivGenBench is employed, comprising 3200 “keyword-driven” prompts across four orthogonal axes: Identity, Artistic Style, Layout, and Tonal/photographic properties. Key metrics include:

Metric	Diversity Axis	Formula/Description
IDS	Identity	ArcFace embedding pairwise cosine similarity; lower is better
ASC	Artistic Style	Retrieval fraction of real styles; higher means closer to real
SDI	Layout	Spatial box IoU dispersion, higher denotes more layout diversity
PVS	Tonal/Photographic	Std-dev of HSV and contrast, higher is more diverse

For LLMs, quantification draws on metrics such as Distinct-n (unique n-grams), semantic embedding diversity, and entropy. Additionally, Coverage-N measures the breadth of responses in tasks such as dialogue or open-ended QA (Zhang et al., 1 Oct 2025).

3. Theoretical Drivers of PMC

PMC arises due to inherent biases in reward modeling and preference data:

Reward Model Bias: In diffusion, reward functions $R(x, c)$ develop intrinsic affinities (e.g., toward specific color palettes). RLHF over-optimizes along these directions:

$\frac{\text{Cov}(R(x, c), \phi_{\text{style}}(x))}{\sigma_R\sigma_\phi} \text{ increases with RLHF iterations}.$

This correlation leads to $p_\theta(x|c)$ collapsing onto a regime with maximal $\phi_{\text{style}}$ (Chen et al., 30 Dec 2025).

Typicality Bias in Annotation: In LLM alignment, human annotators systemically favor “typical” completions, as captured by the base model’s (pre-alignment) likelihood. The learned preference function decomposes as

$r(x, y) = r_{\text{true}}(x, y) + \alpha \log \pi_{\text{ref}}(y|x) + \varepsilon(x, y),$

where $\alpha > 0$ quantifies typicality bias. The resulting aligned model sharpens the pretraining distribution:

$\pi^*(y|x) \propto \pi_{\text{ref}}(y|x)^{1+\alpha/\beta}\exp(r_{\text{true}}(x, y)/\beta),$

so diversity collapses when many outputs are tied in true utility (Zhang et al., 1 Oct 2025).

4. Mitigation Strategies

4.1. Directional Decoupling Alignment (D²-Align) for Diffusion

D²-Align mitigates PMC by introducing a learned correction direction $b_v \in \mathbb{R}^d$ in the reward CLIP embedding space:

Stage 1: With $G_\theta$ frozen, $b_v$ is optimized by constructing guided text and image embeddings and adjusting the reward function

$R_{\text{guided}}(x_0, c; b_v) = \langle e_{\text{img}}, \widetilde{e}_{\text{text}} \rangle$

to decouple the main bias direction.

Stage 2: With $b_v^*$ fixed, align $G_\theta$ using

$\min_\theta \mathcal{L}_{\text{stage2}}(\theta) = -\,\mathbb{E}_{c, x_0 \sim G_\theta(c)} [R_{\text{guided}}(x_0, c; b_v^*)].$

This yields improved sample diversity while maintaining reward quality. Only ∼20 RL steps are needed in Stage 2 compared to 300+ for baselines (Chen et al., 30 Dec 2025).

4.2. Verbalized Sampling (VS) for LLMs

VS is a training-free, inference-time remedy exploiting the model’s pretraining distribution. The central mechanism is to prompt the model to output several completions and their probabilities (“verbalized distribution”):

Instance-level: generate one sample,
List-level: generate $k$ samples,
Distribution-level (VS-Standard): generate $k$ samples with their explicit probabilities, summing to 1.

Repeated VS calls recover most of the pre-alignment model’s diversity, mitigating the effect of collapse imposed by typicality bias (Zhang et al., 1 Oct 2025).

5. Empirical Evidence

5.1. Diffusion Models (DivGenBench; Reward: HPS-v2.1)

D²-Align achieves the best diversity-quality trade-off among evaluated RL-based approaches:

Method	IDS (↓)	ASC (↑)	SDI (↑)	PVS (↑)
FLUX	0.280	0.179	0.563	0.408
DanceGRPO	0.348	0.130	0.488	0.259
FlowGRPO	0.391	0.044	0.389	0.168
SRPO	0.259	0.234	0.580	0.352
D²-Align	0.251	0.253	0.636	0.412

D²-Align combines near-best aesthetic and PickScore with maximal diversity.
Qualitatively, it generates distinct identities, authentic stylistic coverage, varied layouts, and correct tonal execution (Chen et al., 30 Dec 2025).

5.2. LLMs

Across creative writing, dialogue, QA, and synthetic math data generation:

VS improves semantic diversity by 1.6–2.1× in poems, 1.9–2.4× in stories, and up to 3× in jokes. Larger models benefit more (+12 percentage points diversity gain in GPT-4.1 vs +6 in smaller variants).
VS achieves high-quality generations (equaling or exceeding direct prompting), does not erode factual accuracy or safety (>97% refusal on harmful prompts), and permits user-tunable diversity-quality tradeoff via explicit probability thresholds (Zhang et al., 1 Oct 2025).
In math data, VS-based synthetic training boosts downstream accuracy on benchmarks by up to +4.7 pp over direct generation.

6. Comparative Analysis and Limitations

PMC is a distinct form of reward hacking driven by intrinsic reward model or data biases, rather than optimization failures alone. The D²-Align approach is plug-and-play (applicable to other RLHF pipelines), requires orders-of-magnitude fewer RL steps for alignment, and empirically breaks the fidelity–diversity tradeoff in text-to-image tasks. However, it currently relies on a single, frozen reward model; future improvements may require ensemble or adaptive reward models and higher-order embedding corrections.

In LLMs, typicality bias is both empirically pervasive and theoretically guaranteed to cause diversity collapse for any positive $\alpha$ . VS is model-agnostic and API-only, but increases inference cost and offers less benefit for weaker models. Data-centric alignment, such as pluralistic reward modeling, and more exploration-oriented RL objectives remain as open research directions (Zhang et al., 1 Oct 2025).

7. Broader Implications and Future Directions

Preference Mode Collapse highlights a fundamental failure of current alignment paradigms—high automated preference scores do not guarantee genuine diversity or human-like creativity. Methodological proposals such as D²-Align and Verbalized Sampling supply practical mitigation, but further progress depends on more robust preference modeling, ensemble-based or continual adaptation to shifting annotation biases, and explicit modeling of diversity objectives. Extending benchmarks like DivGenBench to richer axes (e.g., object interaction, scene complexity) and cross-domain tasks is an ongoing challenge for the field (Chen et al., 30 Dec 2025, Zhang et al., 1 Oct 2025).

PDF Markdown Chat (Pro)

References (2)

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning (2025)

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Preference Mode Collapse (PMC).

Preference Mode Collapse in Generative Models

1. Formal Definition and Manifestation

2. Benchmarking and Quantification

3. Theoretical Drivers of PMC

4. Mitigation Strategies

4.1. Directional Decoupling Alignment (D²-Align) for Diffusion

4.2. Verbalized Sampling (VS) for LLMs

5. Empirical Evidence

5.1. Diffusion Models (DivGenBench; Reward: HPS-v2.1)

5.2. LLMs

6. Comparative Analysis and Limitations

7. Broader Implications and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Preference Mode Collapse in Generative Models

1. Formal Definition and Manifestation

2. Benchmarking and Quantification

3. Theoretical Drivers of PMC

4. Mitigation Strategies

4.1. Directional Decoupling Alignment (D²-Align) for Diffusion

4.2. Verbalized Sampling (VS) for LLMs

5. Empirical Evidence

5.1. Diffusion Models (DivGenBench; Reward: HPS-v2.1)

5.2. LLMs

6. Comparative Analysis and Limitations

7. Broader Implications and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research