Negative-Aware Fine-Tuning (NFT)

Updated 3 December 2025

NFT is a fine-tuning strategy that integrates negative samples—misleading or noisy inputs—to address model bias, negative transfer, and confounders.
It employs diverse mechanisms including token-level forgetting, patch-level contrastive loss, and label-aware reweighting to sharpen decision boundaries.
Empirical results demonstrate that NFT improves accuracy, calibration, and out-of-distribution robustness across various applications such as language, vision, and code generation.

Negative-Aware Fine-Tuning (NFT) is an umbrella designation for an array of supervised and contrastive fine-tuning methodologies that incorporate information from negative samples—examples, tokens, features, or subspaces associated with incorrect labels, distributional mismatch, or undesirable model behavior—to improve model robustness, generalization, and interpretability. Across contemporary domains—text, code, vision, and multi-modal—the central innovation of NFT is to detect, weight, penalize, or explicitly “forget” negative knowledge during model adaptation, rather than relying purely on positive or reference-aligned training signals.

1. Theoretical Foundations and Problem Motivation

The driving motivation for NFT frameworks is the observed failure of conventional fine-tuning to handle model bias, negative transfer, overfitting to confounders, or inability to discriminate subtle semantic or structural distinctions. In LLMs, standard supervised fine-tuning uniformly minimizes cross-entropy over all output tokens or examples, disregarding whether certain spans are misleading, noisy, or detrimental to downstream performance (Ghahrizjani et al., 6 Aug 2025). In visual and vision-LLMs, negative transfer arises from spuriously correlated or rare features in the pretraining distribution, undermining domain or concept generalization (Yang et al., 2023).

NFT generalizes earlier contrastive paradigms—e.g., positive/negative sampling in supervised contrastive learning—by introducing negative-aware weighting, explicit modeling of negative subspaces or policies, and decoupling of learning for tokens/features with negative influence. In formal causal terms, NFT methods often seek to "block" or "adjust" for confounding influences either through causal front-door adjustment or parameterization of negative subspaces or heads.

2. Formulations and Core Algorithms

NFT instantiations diverge in their granularity (token, patch, feature, policy), domain (NLP, vision, code), and technical mechanism. Core approaches include:

Token-level forgetting in LLMs: Tokens are partitioned using a quality score based on cross-model loss influence. Positive tokens are optimized via standard likelihood minimization, while negative tokens are “forgotten” via maximization (gradient ascent on their loss), shaping sharper knowledge boundaries. The loss at iteration $t$ takes the form

$\mathcal{L}(\theta; t) = \frac1{|P|}\sum_{(i,j)\in P} \ell(y_{i,j};\theta) - \lambda(t)\frac1{|N|}\sum_{(i,j)\in N} \ell(y_{i,j};\theta)$

with a dynamic penalty schedule $\lambda(t)$ (Ghahrizjani et al., 6 Aug 2025).

Concept-wise (patch-level) negative transfer minimization: Features at the patch level are categorized as rare or spuriously correlated. Rare feature representations are strengthened via mutual-information-maximizing contrastive objectives, while spurious correlation is ablated using front-door adjustment implemented as dual attention networks. The architecture alternates between these modules, with sample-level contrastive loss and channel/patch-wise attentions (Yang et al., 2023).
Label-aware contrastive losses: For fine-grained classification, a secondary network learns inter-class similarities, allowing the model to weight negative (confusable) classes more heavily during supervised contrastive learning. The loss is formulated as

$\ell_i(p) = -\log\left(\frac{w_{i,y_i} \exp(h_i \cdot h_p/\tau)}{\sum_{k\neq i} w_{i,y_k} \exp(h_i\cdot h_k/\tau)}\right)$

where the $w_{i,y_k}$ encode class proximity/confusion (Suresh et al., 2021).

Post-hoc negative-aware reweighting in embeddings: Without further model updates, a softmax-weighted score vector is computed over embedding dimensions most responsive to contrastive negation cues via small-scale triplet supervision. The weighted embeddings yield improved performance on negation-sensitive tasks (Cao, 1 Apr 2025).
Parameter-efficient attention debiasing: The Negative Attention Score (NAS) aligns a small subset of attention heads in LLMs associated with negative output bias, freezing the rest. The loss is standard negative-class cross-entropy, but updates only affect identified negative-bias heads; early stopping and rollback is governed by NAS dynamics (Yu et al., 31 Jul 2024).
Negative feedback–aware policy optimization: In LLMs for math reasoning with only verifier signals, the "negative" policy is constructed via

$\pi^-_\theta(a|x) := \frac{\pi(a|x) - \bar r(x) \pi^+_\theta(a|x)}{1-\bar r(x)}$

and the NFT objective jointly maximizes log-likelihood of positive and negative samples under respective posterior policies, bridging supervised and RL methods (Chen et al., 23 May 2025).

Dynamic, fine-grained loss reweighting for code: Error-sensitive code regions (identified by matching correct/incorrect solutions via diffs) receive dynamic, higher loss weights depending on model confusion between variants. The reweighted cross-entropy provides strong inductive pressure on error-discriminative tokens (Fan et al., 21 Mar 2025).

3. Empirical Results and Quantitative Evidence

NFT paradigms yield consistent improvements across multiple domains and benchmarks:

Domain/Setting	NFT Variant	Representative Quantitative Gain	Reference
LLM fine-tuning (QA/math)	Token-level forgetting	+4.5–8.3% avg. over SFT across model sizes	(Ghahrizjani et al., 6 Aug 2025)
Fine-grained image classification	Concept-wise (patch-level)	+1.09% (avg), up to +4.76% (low data) over SOTA	(Yang et al., 2023)
Code generation	Dynamic reweighting	+6.9% pass@1 avg. over SFT, up to +20.8% on hard code	(Fan et al., 21 Mar 2025)
Universal text embeddings	Post-hoc reweighting	+14.5 ppt on complex negation; +11.9 ppt for LLM-based	(Cao, 1 Apr 2025)
Binary reasoning in LLMs	NAS alignment	F1 increase of +3–25 ppt,	Prec–Rec
Math reasoning	Implicit negative policy	+2–10% accuracy over SFT, matches/surpasses RL	(Chen et al., 23 May 2025)

NFT not only increases standard accuracy but also sharpens decision boundaries (as seen in entropy/F1 metrics), improves domain/out-of-distribution (OOD) detection, corrects negative bias, and often enhances calibration. In code generation, NFT outperforms both RL-based and static contrastive approaches on error-sensitive cases.

4. Methodological Variants and Key Design Principles

NFT methods are contextually tailored:

Granularity: Token-level (language), patch/feature-level (vision), code token/line (program synthesis), policy-level (RL/LLM).
Negative sample selection: Empirical quality score, contrastive triplets, pairing with adversarial/wrong outputs, head-level attention analysis.
Parameter updates: Range from full model (e.g., standard SL), dual-networks (label-aware contrast), attention head freezing (parameter-efficient), to no parameter change (post-hoc embedding reweighting).
Negative handling: Loss maximization (forgetting), mutual-information maximization (for rare features), dynamic per-instance loss weighting, front-door adjustment for causality.
Efficiency: Post-hoc reweighting and head-level updates are computationally economical, making NFT practical for large models and real-time adaptation.

5. Limitations, Open Problems, and Future Directions

NFT methods rely on the model’s pre-existing ability to encode the targeted negative-aware signal—reweighting or attention-based correction can only be effective if the underlying features or heads represent the relevant semantic distinctions (Cao, 1 Apr 2025). Selection of hyperparameters (e.g., negative-forgetting schedule, weighting temperature, head selection thresholds) remains sensitive, but stable regions have been empirically demonstrated (Ghahrizjani et al., 6 Aug 2025, Fan et al., 21 Mar 2025). Tasks with highly entangled, non-separable negative and positive features may require joint NFT and more radical architectural adaptation.

How to combine multiple negative cues (e.g., negation + sentiment, OOD + rare features), or dynamically compose weighting vectors for joint or multitask settings, is unresolved (Cao, 1 Apr 2025). NFT does not modify the backbone subspace or reparameterize new features; for representational gaps, conventional fine-tuning or light model updates may be necessary. In LLMs, alignment with RL-based training (e.g., gradient equivalence to GRPO under on-policy) suggests convergent theory, but functional differences persist off-policy (Chen et al., 23 May 2025).

6. Applications and Broader Impact

NFT methodologies have demonstrated efficacy in the following high-value contexts:

Robust LLM deployment: NFT increases the resilience of LLMs to noisy, misleading, or out-of-distribution input, and curbs negative bias in yes/no or verification tasks (Yu et al., 31 Jul 2024, Ghahrizjani et al., 6 Aug 2025).
Visual reasoning and fine-grained recognition: NFT bolsters performance on rare or underrepresented visual concepts and mitigates spurious correlation transfer (Yang et al., 2023, Zhu et al., 26 Jul 2025).
Code synthesis: By localizing and emphasizing error-sensitive regions, NFT boosts pass@1 rates, outperforming static SFT and reward-based RL (Fan et al., 21 Mar 2025).
Universal embedding adaptation: Lightweight NFT via embedding reweighting enables targeted adaptation (e.g., negation awareness) for downstream tasks without sacrificing overall utility (Cao, 1 Apr 2025).

NFT has also been applied as a parameter-efficient adaptation mechanism (e.g., in NASA for LLM heads), and as a bridge between conventional supervised learning and RL for models with sparse or binary supervision (Chen et al., 23 May 2025).

NFT crystallizes a unified perspective on negative-signal integration for model adaptation, leveraging principled partitioning, dynamic loss shaping, causal adjustment, and parameter-efficient response to improve discrimination, robustness, and OOD resilience across model classes and data domains.