Papers
Topics
Authors
Recent
Search
2000 character limit reached

KR-NFT: Knowledge Regularized Negative Feature Tuning

Updated 3 July 2026
  • The paper presents a novel methodology (KR-NFT) that eliminates the need for negative sampling by introducing an explicit knowledge-regularized objective, enhancing calibration and OOD detection.
  • It integrates distribution-aware feature tuning with a global score constraint to reduce computational overhead and mitigate label noise in both vision-language and knowledge graph applications.
  • Empirical results demonstrate significant improvements, such as reduced NLL, lower Brier scores, and decreased FPR95, proving the method’s efficiency and robustness.

Knowledge Regularized Negative Feature Tuning (KR-NFT) is a methodology that enhances the reliability and generalization capacity of machine learning systems in settings where traditional negative-sample-based training is problematic or inefficient. KR-NFT notably provides a principled optimization framework for out-of-distribution (OOD) detection in vision-LLMs (Zhu et al., 26 Jul 2025) and for knowledge graph completion without negative sampling (Hajimoradlou et al., 2022). Its core innovation is the introduction of an explicit knowledge-regularized objective that decouples the need for negative examples while efficiently calibrating the model's output distributions.

1. Foundational Principles and Motivation

Traditional negative prompt or negative sampling schemes in deep learning architectures serve to regularize representation spaces by forcing a distinction between known (positive) and unknown (negative or OOD) classes, facts, or samples. However, in knowledge graphs and compositional reasoning, negative samples are typically generated heuristically, risking label noise and significantly increasing computational overhead (Hajimoradlou et al., 2022). Similarly, in vision-language OOD detection, negative prompt tuning—while effective—often undermines the generalization to unseen categories or styles and is computationally demanding (Zhu et al., 26 Jul 2025).

KR-NFT circumvents these limitations through a knowledge regularization paradigm. It applies distribution-aware feature transformations or global score constraints that encourage the model to reserve high confidence for positive (in-distribution or true) examples while systematically limiting the overconfident scoring of unseen or out-of-distribution instances. This is achieved by explicitly regularizing the model's output space to reflect inductive priors or calibration targets, obviating the need for explicit negative data.

2. Formalism and Objective Function

The central training objective of KR-NFT comprises two synergistic components: a positive-instance loss and a knowledge-regularized global constraint.

Let EE be entities, RR relations, with each entity e∈Ee\in E represented as a vector ze∈Rdz_e \in \mathbb{R}^d and relation r∈Rr\in R as a matrix Zr∈Rd×dZ_r \in \mathbb{R}^{d \times d}. The bilinear score for triple (h,r,t)(h,r,t) is

ϕM(θ)(h,r,t)=zh⊤Zrzt\phi_M(\theta)(h,r,t) = z_h^\top Z_r z_t

  • Stay-Positive Regularizer:

Penalizes deviation of the average model score over all possible triples from a prior ψ\psi:

Lsp(θ)=∥∑h∈E∑r∈R∑t∈EϕM(θ)(h,r,t)−ψ∣E∣2∣R∣∥p\mathcal{L}^{sp}(\theta) = \left\| \sum_{h \in E}\sum_{r \in R}\sum_{t \in E}\phi_M(\theta)(h,r,t) - \psi |E|^2 |R|\right\|_p

Alternatively, with RR0,

RR1

Typically RR2.

  • Full Loss Function:

For the set RR3 of positive (true) triples,

RR4

Here, RR5 controls the strength of the regularizer, and RR6.

This framework trains solely on positive data, enforcing a global constraint on the score distribution that indirectly suppresses the scores for all unobserved (candidate-negative or OOD) instances (Hajimoradlou et al., 2022).

3. Architecture and Optimization Strategy

KR-NFT employs architecture-specific mechanisms for different modalities:

  • Negative Feature Tuning (NFT) (Zhu et al., 26 Jul 2025):
    • Image-conditional Adaptation:
    • Utilizes a lightweight meta-network producing image-conditional learnable factors, enabling dynamic, per-instance adaptation and reducing sensitivity to class/style shifts.
  • Knowledge Regularization (KR):

The optimization augments the NFT architecture with a loss term designed to maximize in-distribution and out-of-distribution separability while minimizing catastrophic forgetting of pre-trained knowledge.

In knowledge graph settings, KR-NFT forgoes the usual generation and scoring of negative triples, reducing computational burden. The regularizer aligns the mean score across all triples with RR7, informed by the anticipated sparsity of true facts.

Pseudocode from (Hajimoradlou et al., 2022) implements this as: r∈Rr\in R0 No negative instances are constructed or processed.

4. Efficiency and Computational Properties

The elimination of negative sampling yields substantial improvements in training efficiency:

Approach Sampled Triples/Epoch Complexity/Epoch Observed Speedup
Negative Sampling RR8 RR9 Baseline
KR-NFT e∈Ee\in E0 positives only + 1 e∈Ee\in E1 e∈Ee\in E2 faster (e∈Ee\in E3)

Here, e∈Ee\in E4 is the per-score computational cost of the bilinear scoring function (e.g., e∈Ee\in E5 for DistMult), e∈Ee\in E6 is average positive frequency in e∈Ee\in E7.

Empirical results show that for a negative ratio e∈Ee\in E8 (typical in baseline setups), KR-NFT reduces per-epoch computational cost by roughly an order of magnitude, with measured reductions of 40–70% in epoch time depending on implementation and batch structure (Hajimoradlou et al., 2022).

5. Experimental Results and Empirical Validation

KR-NFT demonstrates the following empirical properties:

  • Knowledge Graph Calibration (WN11):
    • DistMult:
    • NLL: e∈Ee\in E9 (≈77% drop)
    • Brier: ze∈Rdz_e \in \mathbb{R}^d0 (49% drop)
    • AUC: ze∈Rdz_e \in \mathbb{R}^d1
    • SimplE:
    • NLL: ze∈Rdz_e \in \mathbb{R}^d2
    • Brier: ze∈Rdz_e \in \mathbb{R}^d3
    • AUC: ze∈Rdz_e \in \mathbb{R}^d4
  • Link Prediction Accuracy (WN18AM MRR filtered):
    • DistMult: ze∈Rdz_e \in \mathbb{R}^d5
    • SimplE: ze∈Rdz_e \in \mathbb{R}^d6
  • FPR95 Reduction in Vision-Language OOD Detection:

In few-shot ImageNet settings, FPR95 reduced by ze∈Rdz_e \in \mathbb{R}^d7 when generalizing to unseen ID categories (Zhu et al., 26 Jul 2025).

Thus, KR-NFT not only preserves or slightly improves task-specific accuracy and standard link prediction metrics but dramatically improves probability calibration and reduces OOD false-positive rates, all while increasing training efficiency.

6. Applications, Variants, and Limitations

  • Vision-Language OOD Detection:

NFT enhances the separation between in- and out-of-distribution representations by transforming prompts/features, with KR regularization mitigating forgetting and enabling generalization to new classes and styles (Zhu et al., 26 Jul 2025).

  • Knowledge Graph Inference:

The stay-positive regularizer enables one-class training of relational embeddings (DistMult, SimplE) without negatives, providing better-calibrated confidence scores and efficient learning for large, sparse graphs (Hajimoradlou et al., 2022).

A plausible implication is that similar regularizers could be adapted to other domains where negative instance generation is difficult, expensive, or subject to label noise.

Limitations

KR-NFT performance depends on effective estimation or specification of the prior ze∈Rdz_e \in \mathbb{R}^d8 and appropriate regularization weight ze∈Rdz_e \in \mathbb{R}^d9. Model accuracy and calibration improvements are demonstrated empirically, but heuristic tuning may be necessary for new domains. The approach is most suited to large, sparse prediction regimes where negatives are both critical and costly to generate or annotate.

7. Relation to Existing Methods and Future Directions

KR-NFT formally differentiates itself from:

  • Negative Sampling: Avoids synthetic negatives, sidestepping false negatives, sample selection bias, and computational cost.
  • Traditional Calibration: Employs an explicit global constraint, rather than post hoc calibrators.
  • Meta-Learning OOD Methods: Introduces a meta-network for image-conditional adaptation in NFT, increasing model robustness to shift.

A plausible implication is that broader adoption in settings such as LLMs, open-world classification, and general compositional reasoning is tractable. Expanding the architecture and regularizer to accommodate structured or multimodal data, or hierarchical priors, represents an area of ongoing research interest.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Knowledge Regularized Negative Feature Tuning (KR-NFT).