Papers
Topics
Authors
Recent
2000 character limit reached

Directional Decoupling Alignment (D²-Align)

Updated 6 January 2026
  • The paper introduces D²-Align, a novel framework that controls Preference Mode Collapse by learning a directional correction vector to adjust the reward signal.
  • D²-Align decouples the generator’s behavior from intrinsic reward model biases through a two-stage training process, ensuring both human preference alignment and output diversity.
  • Empirical results on DivGenBench and human evaluations demonstrate that D²-Align outperforms existing RL baselines in both alignment metrics and diversity scores.

Directional Decoupling Alignment (D2^2-Align) is a framework for controlling Preference Mode Collapse (PMC) in text-to-image (T2I) diffusion reinforcement learning (RL). PMC arises from the over-optimization of reward models with intrinsic biases, causing models to produce a narrow set of high-reward but low-diversity outputs. D2^2-Align addresses this by learning a continuous, prompt-embedding–space correction vector applied directionally to the reward signal, decoupling the generator’s behavior from biases in the reward model and preserving both alignment to human preferences and diversity in generated samples (Chen et al., 30 Dec 2025).

1. Preference Mode Collapse and Its Quantification

Preference Mode Collapse (PMC) is a particular manifestation of reward hacking in which RL-fine-tuned T2I diffusion models converge on narrow, reward-favored output modes, such as a single highly stylized “over-exposed” image style, at the expense of diversity. This phenomenon is driven by inherent “favorite” modes in the reward model; naive maximization overfits to these biases, leading to catastrophic loss in generative spread.

Quantification of PMC is provided by DivGenBench, a benchmark of 3,200 prompts that probe four orthogonal diversity axes:

  • Identity (ID): Age, ethnicity, gender, facial features, sourced from CelebA.
  • Artistic Style (Style): Referenced from painting styles in WikiArt.
  • Layout: Object count and spatial arrangement, using COCO-style metadata.
  • Tonal Properties (Tonal): Saturation, brightness, and contrast levels.

For each dimension, bespoke metrics are defined:

Dimension Metric Formula/Process Direction
Identity IDS 2N(N1)i=1Nj>ivivjvivj\frac{2}{N(N-1)}\sum_{i=1}^N\sum_{j>i} \frac{v_i\cdot v_j}{\|v_i\|\|v_j\|} Lower is better
Style ASC IRS(Xsynth)IRS(Xtest)\frac{\text{IRS}_\infty(\mathcal{X}_{\mathrm{synth}})}{\text{IRS}_\infty(\mathcal{X}_{\mathrm{test}})} Higher is better
Layout SDI Averaged 1 minus pairwise box layout similarity (see below) Higher is better
Tonal PVS std(s)+std(v)+std(c)\mathrm{std}(\mathbf{s}) + \mathrm{std}(\mathbf{v}) + \mathrm{std}(\mathbf{c}) Higher is better

Here, IDS employs face embeddings (ArcFace) to quantify crowding in the identity space, ASC uses a style retrieval process against WikiArt, SDI leverages Grounding DINO for object layout, and PVS is a sum of variances in basic tone statistics.

2. Mathematical Framework of D2^2-Align

The framework consists of two main stages, formalized as follows.

Let GθG_\theta denote the T2I generator (diffusion model) and

R(x0,c)=score(Φimg(x0),Φtext(c))R(x_0, c) = \mathrm{score}\left( \Phi_{\mathrm{img}}(x_0),\, \Phi_{\mathrm{text}}(c) \right)

be the reward model, where Φimg\Phi_{\mathrm{img}}, Φtext\Phi_{\mathrm{text}} are frozen encoders and the score(,)\mathrm{score}(\cdot,\cdot) is cosine similarity between image and prompt embeddings.

One-step denoising produces the clean image x^0\hat{x}_0 by sampling ϵgtN(0,I)\epsilon_{\mathrm{gt}}\sim \mathcal{N}(0, I), forming xt=αtx0+σtϵgtx_t = \alpha_t x_0 + \sigma_t \epsilon_{\mathrm{gt}}, predicting noise ϵ^=ϵθ(xt,t)\hat{\epsilon} = \epsilon_\theta(x_t, t), and using

x^0=xtσtϵ^αt\hat{x}_0 = \frac{x_t - \sigma_t \hat{\epsilon}}{\alpha_t}

2.1 Stage 1: Learning the Directional Correction Vector

  • Introduce a learnable correction vector bvRd\mathbf{b}_v \in \mathbb{R}^d; freeze GθG_\theta.
  • For each prompt cc, define prompt embeddings:

e+=normalize(etext+bv),e=normalize(etextbv)\mathbf{e}_+ = \mathrm{normalize}(\mathbf{e}_{\mathrm{text}} + \mathbf{b}_v),\quad \mathbf{e}_- = \mathrm{normalize}(\mathbf{e}_{\mathrm{text}} - \mathbf{b}_v)

  • Construct the guided embedding with classifier-free–style extrapolation and scale ω>1\omega > 1:

e~text=e+ω(e+e)\tilde{\mathbf{e}}_{\mathrm{text}} = \mathbf{e}_- + \omega \left( \mathbf{e}_+ - \mathbf{e}_- \right)

  • Compute the guided reward:

Rguided(x0,c;bv)=score(Φimg(x0),e~text)R_{\mathrm{guided}}(x_0, c; \mathbf{b}_v) = \mathrm{score}( \Phi_{\mathrm{img}}(x_0),\, \tilde{\mathbf{e}}_{\mathrm{text}} )

  • Train bv\mathbf{b}_v to maximize expected RguidedR_{\mathrm{guided}}:

Lstage1(bv)=Ec,x0Gθfrozen[Rguided(x0,c;bv)]\mathcal{L}_{\mathrm{stage1}}(\mathbf{b}_v ) = \mathbb{E}_{c, x_0 \sim G_\theta^{\mathrm{frozen}}} [ - R_{\mathrm{guided}}(x_0, c; \mathbf{b}_v) ]

  • Optimize bv\mathbf{b}_v for T1T_1 steps, resulting in bv\mathbf{b}_v^*.

2.2 Stage 2: Guided Generator Alignment

  • Freeze bv\mathbf{b}_v^* and update GθG_\theta via RL with the guided reward.

Lstage2(θ)=EcD,x0Gθ(c)[Rguided(x0,c;bv)]\mathcal{L}_{\mathrm{stage2}}(\theta) = \mathbb{E}_{c \sim \mathcal{D},\, x_0 \sim G_\theta(c)} [ -R_{\mathrm{guided}}(x_0, c; \mathbf{b}_v^*) ]

  • The net reward function becomes

r(x,c)=r(x,c)+Δr(x,c),Δr(x,c)=r~(x,c)r(x,c)r'(x, c) = r(x, c) + \Delta r(x, c),\quad \Delta r(x, c) = \tilde{r}(x, c) - r(x, c)

where r~(x,c)=Rguided(x,c;bv)\tilde{r}(x, c) = R_{\mathrm{guided}}(x, c; \mathbf{b}_v^*). This shapes the reward directionally.

No regularization term is used other than normalization of embedding vectors and alternation of freeze states.

3. Algorithmic Workflow

The two-stage process is concisely formalized in the following workflow:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Initialize b_v  random
for t in 1T:
    c  sample(𝒞)
    x  G_θ(c)
    ε_gt  N(0, I); x_t = α_t x + σ_t ε_gt
    ε_pred = ε_θ(x_t, t); x̂ via one-step denoise
    e_img = Φ_img(x̂); e_text = Φ_text(c)
    e_± = normalize(e_text ± b_v)
    ê_text = e_- + ω*(e_+  e_-)
    R_guided = score(e_img, ê_text)
    update b_v  b_v  η _{b_v}[R_guided]
b_v^*  b_v

for t in 1T:
    c  sample(𝒞)
    x  G_θ(c)
    ε_gt  N(0, I); x_t = α_t x + σ_t ε_gt
    ε_pred = ε_θ(x_t, t); x̂ via denoise
    e_img = Φ_img(x̂); e_text = Φ_text(c)
    e_± = normalize(e_text ± b_v^*)
    ê_text = e_- + ω*(e_+  e_-)
    R_guided = score(e_img, ê_text)
    update θ  θ  η _θ[R_guided]

This two-stage alternation—learning the correction direction on a frozen generator, then applying it while training the generator—distinguishes D2^2-Align from previous approaches.

4. Empirical Evaluation

Alignment and diversity were quantified using both standard and newly proposed metrics. D2^2-Align consistently matched or exceeded all RL baselines (DanceGRPO, Flow-GRPO, SRPO, and FLUX) in both reward-alignment and diversity on DivGenBench.

4.1 Automated Reward Scores

  • Under HPS-v2.1 reward, D2^2-Align achieved or tied for best:
    • Aesthetic: 6.450 (2nd)
    • ImageReward: 1.771 (best)
    • PickScore: 0.246 (best)
    • Q-Align: 4.969 (tied best)
    • CLIP, DeQA, GenEval: among top scores
  • Under HPS-v2.1+CLIP, D2^2-Align was best on all metrics, including Aesthetic (6.671), ImageReward (1.762), PickScore (0.246), Q-Align (4.970), CLIP Score (0.328), DeQA (4.498), GenEval (0.660).

4.2 Diversity Results (DivGenBench)

  • Identity Divergence Score (IDS): D2^2-Align 0.251 (HPS-v2.1), 0.237 (HPS-v2.1+CLIP) — lowest/best in both cases.
  • Artistic Style Coverage (ASC): 0.253 (HPS-v2.1), 0.247 (HPS-v2.1+CLIP) — highest/best.
  • Spatial Dispersion Index (SDI): 0.636 and 0.631.
  • Photographic Variance Score (PVS): 0.412 and 0.418.

Compared to prior baselines, D2^2-Align achieved uniformly better diversity and did not trade off preference alignment to obtain it.

4.3 Ablations and Human Evaluation

  • Convergence of the correction vector bv\mathbf{b}_v occurred within approximately 2,000 steps in Stage 1.
  • Optimal guidance scale was observed at ω=1.5\omega=1.5.
  • A continuous, learned bv\mathbf{b}_v outperformed discrete token-based alternatives.
  • Incorporating bv\mathbf{b}_v^* as a plug-in to DanceGRPO improved both alignment and diversity metrics.
  • Human preference studies revealed D2^2-Align was selected in ~48.2% of overall HPDv2 cases and was preferred on every DivGenBench diversity axis (Identity, Style, Layout, Tonal).

5. Mechanistic Insights and Applicability

D2^2-Align operates by shifting the direction of the reward gradient in prompt embedding space, rather than scaling its magnitude. This directional shaping distinguishes it from conventional penalty or regularization schemes and directly decouples the generator's optimization trajectory from reward-model–favored modes that drive PMC.

Applying D2^2-Align to other diffusion RL tasks involves the same general workflow:

  1. Freeze the generative policy and learn a prompt-embedding correction vector on ground-truth or human-labeled data.
  2. Freeze the correction vector and RL-finetune the generator under the guided reward.
  3. Evaluate outputs for both alignment (automated/human) and diversity (DivGenBench or analogous metrics).

A plausible implication is that directional decoupling constitutes a general countermeasure against reward-model “mode biases” across a range of alignment domains, not limited to text-to-image diffusion RL.

6. Significance and Perspectives

Directional Decoupling Alignment provides an operational methodology for preserving the diversity of generative models while maintaining high human preference scores, explicitly breaking the quality–diversity trade-off that commonly afflicts RL from human feedback in diffusion models. By leveraging a learned, continuous correction vector in embedding space, D2^2-Align remains free of hand-designed regularizers or constraints.

Overall, the approach demonstrates that shaping the reward in direction (not just in value) is a viable and practical strategy for addressing preference-induced collapse (Chen et al., 30 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Directional Decoupling Alignment (D$^2$-Align).