Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 76 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 206 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 35 tok/s Pro
2000 character limit reached

DiffusionNFT: Negative-Aware FineTuning

Updated 22 September 2025
  • The paper introduces a novel paradigm that integrates negative signals into diffusion model tuning to mitigate negative transfer and enhance sample fidelity.
  • The methodology employs multi-task learning, diffusion-negative prompting, and negative preference optimization to balance positive and negative feedback during training.
  • Empirical evaluations reveal significant improvements in precision, recall, and efficiency, including up to 25× faster performance in forward-process reinforcement learning.

Diffusion Negative-aware FineTuning (DiffusionNFT) is a paradigm for optimizing diffusion models that targets improved controllability and sample quality by explicitly incorporating negative information or preferences into the fine-tuning process. This approach systematically leverages negative signals—ranging from negative transfer mitigation and negative preference optimization to forward-process reinforcement using both positive and negative samples—to address issues of preference alignment, negative transfer, and efficient reinforcement learning in high-dimensional generative models.

1. Motivation and Conceptual Foundations

DiffusionNFT originates from the observation that standard fine-tuning of diffusion models can suffer from negative transfer, model collapse, and suboptimal alignment with human intent due to insufficient modeling of undesirable outcomes or conflicting learning signals. In multi-task denoising scenarios, negative transfer occurs when learning one denoising task degrades performance on another due to conflicting gradients, particularly as the gap between tasks (i.e., noise levels or timesteps) increases (Go et al., 2023).

A broader motivation is the limitation of classifier-free guidance (CFG): it relies on the difference between conditional and unconditional (or negative conditional) generation, yet previous preference alignment and RL approaches often neglect optimization of the unconditional/negative branch, resulting in poor control over undesirable outputs (Wang et al., 16 May 2025). Introducing explicit negative-aware strategies allows diffusion models to better distinguish and avoid undesirable outcomes.

2. Methodologies for Negative-aware FineTuning

DiffusionNFT encompasses several complementary methodologies, each rooted in the explicit modeling or utilization of negative information:

  • Multi-task Learning (MTL) with Negative Transfer Mitigation: Denoising tasks corresponding to different noise levels are clustered by temporal, SNR, or gradient affinity using dynamic programming. MTL techniques such as PCgrad, NashMTL, and Uncertainty Weighting are then applied at the cluster level to reduce negative transfer (Go et al., 2023).
  • Diffusion-negative Prompting (DNP) and Diffusion-negative Sampling (DNS): DNS inverts the standard CFG-based guidance mechanism, encouraging sampling of images that are least compliant with a given prompt. The resulting “negative” images are then translated (manually or via captioning models) into natural language prompts (n*), which, when paired with the original prompt p, improve prompt adherence and sample fidelity (Desai et al., 8 Nov 2024).
  • Negative Preference Optimization (NPO): NPO explicitly trains a model offset (δ) representing negative preferences by inverting preference labels or reward functions. In CFG, the negative branch is parameterized with this offset to more effectively discourage undesirable outcomes, improving alignment with human preferences and enhancing the contrast required for effective guidance (Wang et al., 16 May 2025).
  • Online RL with Forward Process Flow Matching: The most recent instantiation of DiffusionNFT eschews the reverse-process policy gradient in favor of a forward-process, flow matching objective. Policy improvement direction is defined by contrasting the velocity predictors for positive- and negative-rewarded trajectories, directly updating the model’s score function with

v(x,t)=v(old)(x,t)+1βΔ(x,t),v^*(x,t)=v^{(old)}(x,t)+\frac{1}{\beta} \Delta(x,t),

where Δ\Delta is computed using positive and negative data. This enables RL directly on the supervised flow-matching loss, decoupling training from solver restrictions and eliminating likelihood estimation (Zheng et al., 19 Sep 2025).

3. Technical Implementation Strategies

The implementation details for DiffusionNFT depend on the variant but share common threads:

Component Description Example Reference
Task Clustering Interval clustering via dynamic programming; cost based on timestep, SNR, or gradient affinity. (Go et al., 2023)
Negative-guided Sampling DNS flips guidance in the diffusion update: ϵ^=ϵ^ϕ+s(ϵ^ϕϵ^p)\hat\epsilon = \hat\epsilon_\phi + s(\hat\epsilon_\phi - \hat\epsilon_p). (Desai et al., 8 Nov 2024)
Offset Parameterization Weighting: θneg=θ+αη+βδ\theta_{neg} = \theta + \alpha\eta + \beta\delta for negative branch, η\eta (positive), δ\delta (negative) offsets. (Wang et al., 16 May 2025)
Forward Process RL Objective Policy improvement: v(x,t)=v(old)(x,t)+1βΔ(x,t)v^*(x,t)=v^{(old)}(x,t)+\frac{1}{\beta}\Delta(x,t); Δ\Delta computed by comparing positive and negative sample velocity fields. (Zheng et al., 19 Sep 2025)

Dynamic programming allows efficient clustering of hundreds of denoising tasks. PCgrad, NashMTL, and Uncertainty Weighting are adapted to clustered settings, with per-cluster gradient computations (as opposed to per-task) to control computational costs. In DNP/DNS, guidance scale inversion or prompt swapping is implemented in standard sampling routines, requiring no retraining.

For NPO, gradient updates are computed using inverted preference pairs or flipped reward models; at inference, both conditional and negative-conditional branches are evaluated using different parameter offsets in the CFG formula. In the online RL setting, the velocity matching objective can be implemented with standard flow-matching code, updating model weights based on the reward-partitioned positive and negative trajectories.

4. Empirical Performance and Evaluation

Numerous empirical studies demonstrate the efficacy of DiffusionNFT approaches:

  • Negative Transfer Mitigation: Interval clustering with MTL integration yields lower FID, improved precision/recall, and faster convergence than standard joint denoising training (Go et al., 2023).
  • Diffusion-negative Prompting: DNP improves prompt adherence and image realism, with human evaluators preferring (p, n*) prompt pairs over p alone. Quantitative metrics (e.g., CLIP score, Inception Score) consistently improve (Desai et al., 8 Nov 2024).
  • NPO Enhancement: NPO-augmented models yield higher scores on Pick-a-pic, HPSv2, and ImageReward metrics, with improved image sharpness, color, and composition. Users express clear preference in human evaluations (Wang et al., 16 May 2025).
  • Forward-process RL: Online DiffusionNFT achieves up to 25× efficiency over FlowGRPO, reaching GenEval scores of 0.98 in 1k steps (vs. 0.95 in >5k for FlowGRPO). Outperforms or matches CFG-guided larger models across rule-based and model-based reward metrics (Zheng et al., 19 Sep 2025).

5. Applications and Deployment Scenarios

DiffusionNFT methods are applicable in various domains requiring robust generative modeling with fine-grained control:

  • Text-to-image and text-to-video generation: Improved preference alignment (e.g., in SD1.5, SDXL, VADER) and avoidance of undesirable outputs.
  • Fairness and bias mitigation: Control of demographic attribute distributions via adjusted fine-tuning losses (Shen et al., 2023).
  • Safe and efficient deployment: Unified bilevel optimization for pruned models supports resource-limited environments while enabling concept suppression (Shirkavand et al., 19 Dec 2024).
  • Online RL for generative policy improvement: Rapid adaptation to dynamic objectives or reward functions (e.g., multi-reward joint training of SD3.5-Medium) (Zheng et al., 19 Sep 2025).

6. Challenges and Future Directions

Several technical and practical challenges are noted:

  • Semantic gap: Diffusion-negative images are often unintuitive in human terms; bridging the gap for interpretable negative prompts (e.g., via advanced captioning models) remains open (Desai et al., 8 Nov 2024).
  • Seed sensitivity and stability: The stochastic nature of negative sampling can produce variable results; robustness strategies are required.
  • Guidance scale tuning: Proper settings for guidance strength remain crucial for balancing fidelity and suppression of undesirable features (Yoon et al., 4 Jul 2024).
  • Scalability: Extension to new modalities (audio, text), more complex task structures, and efficient management of negative preference modeling at scale.

Future directions include refinement of negative representation learning, adaptive guidance strategies, and further theoretical development of flow-matching-based RL for generative modeling. Integration of multi-modal and multi-objective reward signals and exploration of hybrid supervision (combining human and model-derived negatives) are plausible avenues.

7. Summary Table: DiffusionNFT Variants

Approach / Variant Negative Information Source Main Mechanism Domain
Interval Clustering + MTL Task affinity, gradient conflict Clustered MTL regularization Generic denoising
Diffusion-negative Prompting Negative samples via inverted CFG DNS + natural language translation Text-to-image
Negative Preference Optimization Inverted rewards / preference pairs Weight offsets in CFG branches Image, Video, Multi-modal
Forward-process RL (flow matching) Positive/negative reward partitions Δ-policy update via flow matching Image generation

These frameworks collectively define the Diffusion Negative-aware FineTuning paradigm, which unifies negative information modeling for improved, robust, and controllable diffusion-based generation across domains.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Diffusion Negative-aware FineTuning (DiffusionNFT).