Papers
Topics
Authors
Recent
2000 character limit reached

LUNE: LoRA-based Unlearning with Negative Examples

Updated 15 December 2025
  • The paper introduces a parameter-efficient framework that confines unlearning to low-dimensional LoRA modules guided by negative examples.
  • LUNE achieves targeted removal with a 10× reduction in computational cost while preserving overall model performance.
  • Empirical results validate robust unlearning in both language and medical imaging models, demonstrating high unlearning success and utility retention.

LoRA-based Unlearning with Negative Examples (LUNE) is a framework for targeted and efficient model unlearning that leverages Low-Rank Adaptation (LoRA) modules and negative example-driven supervision. LUNE aims to remove specific knowledge or behaviors from neural models—such as LLMs or medical image predictors—without incurring the computational and generalization costs of full model retraining or direct backbone weight editing. By confining trainable updates to a low-dimensional subspace and employing adversarial or negative-only fine-tuning, LUNE achieves controlled knowledge removal while minimizing collateral utility loss on retained capabilities (Liu et al., 8 Dec 2025, Datta et al., 20 Nov 2025).

1. Motivation and Background

In both vision and language settings, neural models accumulate extensive domain and factual knowledge from large datasets. Regulatory and practical requirements—including privacy compliance, bias mitigation, and continual dataset revisions—increasingly demand mechanisms to remove or update specific content post-training. Prior approaches, such as full model retraining on curated datasets (excluding or down-weighting the target knowledge), or direct memory-editing of the model weights, entail prohibitive computational costs and can induce “catastrophic unlearning” by globally degrading performance on unrelated tasks.

LUNE addresses this gap by introducing a selective, lightweight approach that (i) restricts updates to LoRA adapters while freezing the original backbone, and (ii) guides those updates using negative examples—inputs paired with outputs that explicitly refute, contradict, or replace the target knowledge. This design produces localized edits in parameter space and targets the erasure of undesired content with high sample and compute efficiency (Liu et al., 8 Dec 2025).

2. Core Methodology and Architectural Design

LUNE adopts a parameter-efficient fine-tuning paradigm based on Low-Rank Adaptation (LoRA). LoRA introduces lightweight, trainable matrices into the structure (e.g., attention and feed-forward modules for LLMs; convolutional decoders in segmentation nets), leaving the bulk of pre-trained weights fixed. For a base weight matrix W0Rdout×dinW_0 \in \mathbb{R}^{d_{out} \times d_{in}}, the adapted version is W=W0+ABW' = W_0 + AB^\top, where ARdout×rA \in \mathbb{R}^{d_{out}\times r} and BRdin×rB \in \mathbb{R}^{d_{in}\times r}, and rmin(dout,din)r \ll \min(d_{out}, d_{in}).

Integration in LLMs and Medical Models

  • LLMs: LoRA adapters are inserted into query/key/value/output projections and feed-forward submodules of each Transformer block. All original model parameters θ\theta are frozen, and only the LoRA parameters ϕ={A,B}\phi = \{A, B\} are updated using negative example supervision.
  • Medical Image Segmentation: The LoRA adapters are applied to decoder convolution layers and the segmentation head. A teacher-student framework is used, freezing the full-capacity teacher and its weights or features as reference for the student’s updates (Datta et al., 20 Nov 2025).

A strong unlearning phase adversarially updates LoRA parameters to suppress, contradict, or increase uncertainty on the forget set, followed by a gentle restoration phase that recovers generalization on retained data by updating only the final head.

3. Negative Example Construction and Supervision

Negative example design is central to LUNE's effectiveness.

  • LLMs: For each fact or behavior to be forgotten, negative completions are synthesized that either (a) state explicit contradictions, (b) propose plausible but incorrect alternatives, or (c) use paraphrased forms to maximize generalization. Candidate negatives are filtered to eliminate uncertain or hedged completions, and diversity is maintained to avoid paraphrase dominance (Liu et al., 8 Dec 2025). No positive (“retain”) supervision is needed.
  • Medical Models: The forget set comprises inputs whose outputs or features are to be erased. Supervisory losses include label flips (forcing the probability distribution away from the ground-truth), explicit teacher contradiction, and entropy maximization, all applied exclusively to the forget set Df\mathcal{D}_f (Datta et al., 20 Nov 2025).

This negative-only approach enables targeted removal with minimal impact on unrelated functionality, as confirmed by ablations comparing LUNE with random or irrelevant negatives.

4. Training Procedures, Losses, and Hyperparameters

LLMs

L(ϕ)=(x,y)DneglogPθ,ϕ(yx).L(\phi) = -\sum_{(x, y^-) \in \mathcal{D}_{neg}} \log P_{\theta, \phi}(y^-|x).

  • Optimization: AdamW with learning rate 2×1042\times 10^{-4}, weight decay $0.01$, and mixed precision.
  • LoRA settings: Default rank r=16r=16 (r{2,4,8,16,32}r\in\{2,4,8,16,32\} tested in ablation), scaling α=r\alpha = r, dropout $0.05$.
  • Early stopping is based on convergence of the Unlearning Success Rate (USR) and General Utility Retention (GUR) metrics (Liu et al., 8 Dec 2025).

Medical Segmentation

  • Strong unlearning loss (LascL_{asc}) combines:
    • Label-flip (LflipL_{flip}),
    • Teacher-contradiction (LtcL_{tc}),
    • Entropy maximization (LentL_{ent}),
    • Feature repulsion (LrepL_{rep}),
    • Mean-probability regularization (LmeanL_{mean}),
    • Total variation (LtvL_{tv}).
  • LoRA hyperparameters: rank r=8r=8, dropout $0.05$, learning rates ηϕ=104\eta_\phi = 10^{-4} (LoRA), ηψ=5×105\eta_\psi = 5\times10^{-5} (head).
  • Batch sizes and step counts are dataset-dependent.
  • The gentle restoration phase applies supervised, distillation, and guard losses only to the segmentation head (Datta et al., 20 Nov 2025).

5. Evaluation Metrics and Experimental Results

LUNE is evaluated using a suite of metrics quantifying both unlearning efficacy and utility retention.

Metric Description Reference
USR Fraction of prompts where the undesired output is absent (LLM) (Liu et al., 8 Dec 2025)
GUR Ratio of general-domain performance post-unlearning to baseline (Liu et al., 8 Dec 2025)
APR Robustness to adversarial or paraphrased prompts (Liu et al., 8 Dec 2025)
MIA Membership inference attack accuracy (lower is better) (Liu et al., 8 Dec 2025)
Δforget\Delta_\mathrm{forget} IoU drop on the forget set (vision) (Datta et al., 20 Nov 2025)
Δretain\Delta_\mathrm{retain} IoU drop on the retain set (vision) (Datta et al., 20 Nov 2025)

Quantitative results indicate that LUNE achieves state-of-the-art GUR across benchmarks (e.g., 95.1% on EDU-RELAT, 93.7% on RWKU), high USR (88–92%), and lowest MIA in most settings (Liu et al., 8 Dec 2025). In medical segmentation, forget-set IoU drops from 0.875 to 0.509 (ISIC), while retain-set IoU remains stable (0.677 vs 0.647), demonstrating selective forgetting (Datta et al., 20 Nov 2025).

Computational costs are reduced by 10× compared to full fine-tuning, as LoRA parameters comprise 10310^{-3}10210^{-2} of total model weights, with proportional reductions in optimizer state and gradient updates (Liu et al., 8 Dec 2025).

6. Theoretical Implications and Limitations

Restricting updates to a low-rank subspace (via LoRA) localizes parameter changes, minimizing the risk of catastrophic forgetting or unintended utility loss. Negative supervision targets only the desired conceptual region in function space, with empirical ablations showing clear superiority over random or non-contradictory negatives for all key metrics.

Limitations include:

  • Over-suppression, where negative fine-tuning may marginally impair related knowledge (though GUR remains high).
  • Current experiments focus on single-fact or attribute forgetting; extension to more complex, compositional, or multi-instance domains is ongoing.
  • The robustness of unlearning depends on the quality and breadth of negative example construction, which may not capture all adversarial paraphrases or hidden dependencies (Liu et al., 8 Dec 2025).

A plausible implication is that future LUNE derivations could incorporate automated relevance feedback, multi-instance unlearning, and more sophisticated low-rank regularization to further confine updates and scale to continual learning scenarios.

7. Applications, Extensions, and Research Directions

Applications of LUNE span:

  • Medical imaging: Selective erasure of sensitive anatomical features, lesion classes, or entire sample sets from segmentation/classification networks (Datta et al., 20 Nov 2025).
  • LLMs: Removal or suppression of specific facts, personal data, or biased constructs, with direct relevance for privacy and knowledge correction tasks (Liu et al., 8 Dec 2025).
  • Other vision/NLP domains: Adaptation to object detection (e.g., zeroing confidence on specific bounding boxes) and attention-layer LoRA for targeted fact erasure.

Ongoing research explores automated negative dataset generation, continual unlearning within a single model instance, and theoretical analysis of how LoRA’s low-rank constraints bound functional drift and semantic “leakage.” Expanding LUNE to handle broader conceptual unlearning and abstract knowledge remains an open challenge.

Key References:

Whiteboard

Follow Topic

Get notified by email when new papers are published related to LoRA-based Unlearning with Negative Examples (LUNE).