Papers
Topics
Authors
Recent
Search
2000 character limit reached

DiffusionNFT: Negative-Aware Finetuning

Updated 4 July 2026
  • DiffusionNFT is a paradigm that integrates negative signals such as forbidden concepts and low-reward samples into the finetuning process of diffusion models.
  • It employs diverse formulations including bilevel concept suppression and online reinforcement learning to enhance model reliability and efficiency.
  • Empirical studies indicate that DiffusionNFT improves generation quality, reduces negative behaviors, and achieves efficient adaptation across various diffusion tasks.

Diffusion Negative-aware Finetuning (DiffusionNFT) denotes a class of diffusion-model adaptation procedures in which training is explicitly informed by negative evidence—such as forbidden concepts, low-reward generations, or human-dispreferred outputs—rather than relying only on positive reconstruction or preference signals. In the literature covered here, the term spans multiple closely related but technically distinct formulations: a supervised bilevel scheme that unifies fine-tuning and concept suppression for pruned diffusion models, an online reinforcement-learning formulation that optimizes flow-matching models directly on the forward process, and later extensions for value estimation, preference alignment, and instruction-based image editing (Shirkavand et al., 2024, Zheng et al., 19 Sep 2025, Go et al., 19 May 2026, Wang et al., 16 May 2025, Li et al., 19 Oct 2025). This suggests an umbrella usage centered on negative-aware post-training rather than a single canonical loss.

1. Terminological scope and conceptual background

Across these works, “negative-aware” consistently refers to training rules that do not treat all generated or supervised targets symmetrically. Instead, they incorporate explicit avoidance signals: a concept to suppress, a low-reward sample to push away from, or a negative-preference branch used to sharpen classifier-free guidance. What changes across papers is the object being optimized: a noise predictor in conditional diffusion, a velocity field in flow matching, or a separate negative branch used during guidance (Shirkavand et al., 2024, Zheng et al., 19 Sep 2025, Wang et al., 16 May 2025).

A broader conceptual backdrop is the multi-task view of diffusion training. “Addressing Negative Transfer in Diffusion Models” treats denoising across timesteps as multi-task learning, reports that task affinity decreases as the timestep or signal-to-noise-ratio gap widens, and shows that negative transfer can occur even in standard diffusion training. Its interval-clustering and cluster-level MTL machinery were presented as a finetuning-friendly way to protect vulnerable timesteps during adaptation. This is not the same algorithmic family as later DiffusionNFT papers, but it provides an important interpretation of what “negative-aware” can mean in diffusion optimization: not only suppressing unwanted content, but also preventing harmful interference across denoising tasks (Go et al., 2023).

Use in the literature Object updated Negative signal
Controlled fine-tuning for pruned diffusion models Pruned student noise predictor Forbidden concept and anchor concept
Online diffusion RL on the forward process Velocity field / denoiser Low-reward generations via optimality probability
Negative-preference optimization for CFG Negative-conditional branch Reversed preference pairs or $1-R$ reward
Instruction-based editing post-training Flow-matching image editor MLLM-derived reward with group filtering

2. Bilevel controlled fine-tuning and concept suppression

A prominent early instantiation appears in “Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models,” which maps directly onto DiffusionNFT even though the paper itself refers to “our bilevel method” and “controlled fine-tuning.” The motivation is specific: pruning reduces compute and improves deployability, but the subsequent distillation step can reintroduce undesirable behaviors from the teacher, including copyrighted styles or NSFW concepts, even when such instances are absent from the fine-tuning dataset. The paper argues that a naïve two-stage pipeline—first restoring quality, then running concept unlearning—is both inefficient and suboptimal because parameters good for fine-tuning need not be good for unlearning, and unlearning can damage generation quality (Shirkavand et al., 2024).

The proposed solution is a bilevel optimization in which the lower level restores the pruned model and the upper level suppresses a specified concept. With teacher noise predictor ϵT\epsilon_T, pruned student ϵS\epsilon_S, positive data DfD_f, forbidden concept cc, and anchor concept cc', the lower-level objective combines diffusion regression with output-level and feature-level distillation:

Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).

The upper-level suppression term uses anchor distillation:

Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].

The constrained problem is then relaxed into a penalty-based minimax objective and solved with a double-loop first-order algorithm rather than implicit differentiation or truncated backpropagation (Shirkavand et al., 2024).

This formulation is negative-aware in a precise sense. The student is not merely asked to fit DfD_f well; it is also trained so that conditioning on the forbidden concept cc moves its prediction toward the teacher’s anchor concept ϵT\epsilon_T0. The paper also evaluates an ESD-style upper-level loss in which the target is derived from classifier-free guidance structure:

ϵT\epsilon_T1

This replaces simple omission of unwanted data with active redirection away from the target concept (Shirkavand et al., 2024).

The implementation details reinforce the intended deployment setting. The teacher is Stable Diffusion 2.1; the student is a pruned SD2.1 U-Net expert obtained with APTP; the text encoder remains frozen; the fine-tuning set is MS-COCO-2017; and the main experiments use an expert at 80% MAC budget. The reported weights are diffusion loss ϵT\epsilon_T2, output KD ϵT\epsilon_T3, and feature KD ϵT\epsilon_T4, with ϵT\epsilon_T5, ϵT\epsilon_T6 lower steps per upper step, AdamW, lower learning rate ϵT\epsilon_T7, upper learning rate ϵT\epsilon_T8, and about ϵT\epsilon_T9 total iterations (Shirkavand et al., 2024).

3. Forward-process online RL formulation

A later and more explicit use of the name appears in “DiffusionNFT: Online Diffusion Reinforcement with Forward Process,” which defines Diffusion Negative-aware FineTuning as an online RL paradigm for diffusion and flow-matching models. The core departure from GRPO-style reverse-process training is that optimization is performed directly on the forward process, without stepwise reverse likelihoods, and with arbitrary black-box solvers for sampling. The method is framed in velocity space under flow matching, with forward path

ϵS\epsilon_S0

and target vector field

ϵS\epsilon_S1

Under rectified flow, this reduces to the familiar linear interpolation regime used by later editing work as well (Zheng et al., 19 Sep 2025, Li et al., 19 Oct 2025).

The negative-aware mechanism is implemented by splitting samples into implicit positive and negative branches using an optimality probability ϵS\epsilon_S2 derived from reward-normalized generations. Let ϵS\epsilon_S3 be the frozen data-collection policy and ϵS\epsilon_S4 the current model. The method defines

ϵS\epsilon_S5

and optimizes

ϵS\epsilon_S6

The paper’s theory interprets this as an implicit policy-improvement operator in forward vector-field space: high-reward samples pull the model toward the positive branch, while low-reward samples create an explicit repulsive term through the negative branch. No reverse-process likelihood estimation is required, and training needs only final clean images rather than stored sampling trajectories (Zheng et al., 19 Sep 2025).

This formulation is technically distinct from the bilevel concept-suppression variant. It is CFG-free, solver-agnostic, and naturally off-policy through an EMA-style update of the data-collection policy. It also shifts the meaning of “negative-aware” from targeted erasure to reward-weighted avoidance of bad generations. In head-to-head comparisons, the paper reports that DiffusionNFT is up to ϵS\epsilon_S7 more efficient than FlowGRPO, reaches GenEval ϵS\epsilon_S8 within ϵS\epsilon_S9 steps from a CFG-free SD3.5-Medium baseline of DfD_f0, and improves OCR from DfD_f1 to DfD_f2, PickScore from DfD_f3 to DfD_f4, CLIPScore from DfD_f5 to DfD_f6, HPSv2.1 from DfD_f7 to DfD_f8, Aesthetics from DfD_f9 to cc0, ImageReward from cc1 to cc2, and UnifiedReward from cc3 to cc4 (Zheng et al., 19 Sep 2025).

4. Negative signals, reward estimation, and guidance design

One major axis of variation in DiffusionNFT is the source of the negative signal. In the bilevel concept-suppression formulation, negativity is specified symbolically through forbidden prompts, artist names, NSFW phrases, and anchor concepts such as “art,” “person,” or the unconditional branch. The upper-level loss is therefore concept-directed and local to a specified unlearning target (Shirkavand et al., 2024).

In forward-process RL, negativity is implicit in reward. The model samples images, receives scalar evaluations, and converts them into an optimality probability by per-prompt centering, scaling, clipping, and affine remapping to cc5. The negative branch is then weighted by cc6 rather than by a manually curated prompt list. “Stitched Value Model for Diffusion Alignment” extends this setup by addressing the central difficulty that rewards are naturally defined on clean images, while DiffusionNFT training and guidance require value estimates at noisy latents. That paper contrasts two existing approximations—Tweedie-style posterior-mean evaluation and Monte Carlo rollouts—and proposes StitchVM, which stitches a frozen diffusion backbone head to a truncated pixel-space reward model tail so that a value function cc7 can be amortized directly on noisy latents. In the reported SD3.5-Medium setup, DiffusionNFT requires cc8 GPU-hours, whereas DiffusionNFT plus StitchVM requires cc9 GPU-hours, with nearly identical DrawBench metrics such as HPSv2 cc'0 versus cc'1, DFN cc'2 versus cc'3, ImageReward cc'4 versus cc'5, PickScore cc'6 versus cc'7, and GenEval cc'8 versus cc'9 (Go et al., 19 May 2026).

A third variant appears in “Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models.” Here the negative signal is not a forbidden concept or a scalar penalty on generated samples, but a dedicated negative-preference model used as the unconditional or negative-conditional branch in classifier-free guidance. The method keeps the base preference-optimization objective unchanged and trains a separate offset Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).0 by reversing preference pairs for DPO/SPO-style methods or by replacing the reward with Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).1 for RL/DR methods. Inference uses

Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).2

with Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).3. This makes the negative branch itself preference-aware, rather than merely unconditional (Wang et al., 16 May 2025).

Instruction-based editing introduces yet another reward source. In “Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback,” DiffusionNFT is combined with a training-free MLLM reward model. Candidate edits are scored through constrained score-token logits over Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).4, normalized to Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).5, and converted into the optimality probabilities used by the flow-matching NFT loss. Because near-saturated groups can create small standard deviations and unstable normalization, the paper discards groups satisfying Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).6 and Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).7. This low-STD group filtering is presented as a variance-reduction mechanism rather than a new reward function (Li et al., 19 Oct 2025).

5. Empirical behavior across tasks

For controlled fine-tuning and concept suppression, the strongest quantitative evidence comes from artist-style erasure and NSFW removal on pruned SD2.1. On artist removal, the bilevel method reports CLIP Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).8, CP Lft(θ)=LDiff(θ)+λOutKDLOutKD(θ)+λFeatKDLFeatKD(θ).L_{\mathrm{ft}}(\theta)=L_{\mathrm{Diff}}(\theta)+\lambda_{\mathrm{OutKD}}L_{\mathrm{OutKD}}(\theta)+\lambda_{\mathrm{FeatKD}}L_{\mathrm{FeatKD}}(\theta).9, CSD Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].0, FID Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].1, and COCO CLIP Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].2, compared with Distilled + ConceptPrune at CLIP Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].3, CP Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].4, CSD Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].5, FID Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].6, and COCO CLIP Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].7. On NSFW removal, the paper states that nudity reduction is comparable to baselines while retaining quality, and reports adversarial robustness scores of Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].8 on MMA and Lunlearn(θ)=E[ϵT(xt,t,c)ϵS(xt,t,c)2].L_{\mathrm{unlearn}}(\theta)=\mathbb{E}\big[\|\epsilon_T(x_t,t,c')-\epsilon_S(x_t,t,c)\|^2\big].9 on Ring-A-Bell (Shirkavand et al., 2024).

For online diffusion RL, the central empirical claim is efficiency under solver-agnostic, CFG-free training. The reported comparison with FlowGRPO emphasizes both speed and sample efficiency: DiffusionNFT reaches GenEval DfD_f0 within DfD_f1 steps, whereas FlowGRPO reaches DfD_f2 only after more than DfD_f3 steps and with additional CFG employment. The method is also reported to outperform or match larger CFG-based models while using a single conditional model at inference (Zheng et al., 19 Sep 2025).

For preference alignment, Diffusion-NPO presents DiffusionNFT as negative-preference finetuning in weight space. On SD1.5, Diff.-SPO improves over the base model, but adding NPO further improves several metrics; for example, Diff.-SPO + NPO reports PickScore DfD_f4, HPSv2 DfD_f5, ImageReward DfD_f6, and LAION-Aesthetic DfD_f7. On DreamShaper, NPO improves PickScore from DfD_f8 to DfD_f9 and HPSv2 from cc0 to cc1 at cc2, and increases LAION-Aesthetic to cc3 at cc4. The same paper also reports gains on SDXL and VideoCrafter2/VADER, including out-of-domain human videos (Wang et al., 16 May 2025).

For image editing, Edit-R1 applies flow-matching DiffusionNFT with MLLM reward to FLUX.1-Kontext, Qwen-Image-Edit, and UniWorld-V2. The reported final scores are cc5 on ImgEdit and cc6 on GEdit-Bench for UniWorld-V2. The same framework improves FLUX.1-Kontext [Dev] from cc7 to cc8 on ImgEdit and from cc9 to ϵT\epsilon_T00 on GEdit-Bench, and improves Qwen-Image-Edit from ϵT\epsilon_T01 to ϵT\epsilon_T02 on ImgEdit and from ϵT\epsilon_T03 to ϵT\epsilon_T04 on GEdit-Bench (Li et al., 19 Oct 2025).

A plausible implication is that DiffusionNFT is especially useful when the adaptation problem is under-specified by positive supervision alone. In pruned-model deployment, this appears as selective unlearning without a second stage; in online RL, as avoidance of low-reward modes without reverse likelihoods; in CFG alignment, as an explicitly trained negative branch; and in editing, as reward-driven correction beyond supervised instruction pairs.

6. Limitations, failure modes, and adjacent debates

The literature also makes clear that negative-aware finetuning is not automatically stable or irreversible. In the bilevel concept-suppression setting, reported risks include over-suppression, style bleeding, reduced diversity, catastrophic forgetting, prompt sensitivity, and residual leakage under adversarial prompts. The recommended mitigations are balanced inner/outer cadence, conservative guidance strength, careful choice of anchor concepts, prompt augmentation, and runtime safety filters (Shirkavand et al., 2024).

The online RL lineage introduces a different set of sensitivities. DiffusionNFT depends on reward calibration, off-policy update scheduling, and the choice of ϵT\epsilon_T05. The paper reports that smaller ϵT\epsilon_T06 can accelerate reward improvement but may destabilize training, while overly on-policy updates can be fast early and unstable later. Reward hacking remains a general issue when reward models are imperfect (Zheng et al., 19 Sep 2025). The value-modeling literature adds that Tweedie-style estimators are biased at high noise, Monte Carlo rollouts are expensive and high-variance, and even stitched value models can weaken at very high noise levels, motivating intermediate stopping windows such as ϵT\epsilon_T07 on a ϵT\epsilon_T08-step schedule (Go et al., 19 May 2026).

Instruction-based editing makes the reward problem especially visible. Edit-R1 reports that small MLLM evaluators can collapse reward variance and encourage reward hacking, whereas larger evaluators such as ϵT\epsilon_T09 models maintain higher variance and more stable optimization. Low-STD group filtering improves stability, but it also discards training data and therefore trades sample efficiency for variance control (Li et al., 19 Oct 2025).

A separate critique concerns robustness of finetuning-based unlearning itself. “Towards Irreversible Machine Unlearning for Diffusion Models” proposes DiMRA, an auxiliary-data relearning attack that can reverse finetuning-based unlearning methods such as ESD, CA, Salun, SHS, and Ediff, and argues that many such methods remain near the original optimum because of non-convergent or mismatched unlearning terms. Its defense, DiMUM, replaces forgetting-style objectives with convergent memorization of alternative content under the unlearned condition. This does not refute DiffusionNFT as a training principle, but it does qualify a common assumption: negative-aware fine-tuning is not necessarily irreversible unless the objective settles into a genuinely new stable optimum (Yuan et al., 3 Dec 2025).

Finally, the term’s breadth can itself be a source of confusion. Some papers use DiffusionNFT for concept suppression in conditional diffusion, others for online RL with flow matching, and others for negative-preference guidance or editing post-training. The common thread is explicit negative supervision inside the finetuning objective. Beyond that shared principle, the concrete optimization geometry, target parameterization, and evaluation protocol differ substantially across subfields (Shirkavand et al., 2024, Zheng et al., 19 Sep 2025, Wang et al., 16 May 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diffusion Negative-aware Finetuning (DiffusionNFT).