Papers
Topics
Authors
Recent
2000 character limit reached

AUVIC: Adversarial Unlearning in MLLMs

Updated 21 November 2025
  • AUVIC is a framework for fine-grained erasure of visual concepts from MLLMs, achieving selective forgetting with minimal collateral damage.
  • It uses adversarial perturbation with dynamic anchor preservation to target specific visual concepts without disrupting related entities.
  • Evaluated on VCUBench, AUVIC demonstrates high precision through metrics like TFA, NTRA, and GRF-F₁, balancing erasure and retention effectively.

AUVIC, introduced in "AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal LLMs" (Chen et al., 14 Nov 2025), is a framework for fine-grained erasure of visual concepts from Multimodal LLMs (MLLMs). The method enables selective forgetting of specific visual concepts while minimizing degradation on non-target entities, addressing regulatory and ethical requirements such as the GDPR "right to be forgotten." AUVIC is evaluated using VCUBench, the inaugural benchmark designed for precision assessment of targeted visual concept unlearning in both single- and group-photo settings.

1. Motivation and Research Context

MLLMs are increasingly deployed across domains leveraging massive, uncurated image–text corpora. These datasets routinely contain sensitive, private, or copyrighted visual concepts—such as individual faces—raising substantial data privacy concerns. Regulatory frameworks mandate capabilities for post-hoc selective concept erasure without costly full retraining. While text unlearning has matured, the precise removal of one visual concept in a multi-entity context, especially under constraints to avoid collateral forgetting, is an unresolved technical challenge. Existing benchmarks either focus exclusively on text or only on single-object removal and neglect group scenarios and retention assessment for non-target concepts. VCUBench is developed to systematically fill these evaluation gaps.

2. Task Formalization and Metric Definitions

Let I\mathbb{I} denote a set of images and C={c1,,cK}\mathcal{C} = \{c_1,\dots,c_K\} the set of identities (visual concepts). An MLLM fθf_\theta is queried with image xIx \in \mathbb{I} and textual prompt qq, generating label y^=fθ(x,q)C{unknown}\hat{y} = f_\theta(x, q) \in \mathcal{C} \cup \{\text{unknown}\} or free-form captioning. Given a target concept ctCc_t \in \mathcal{C}, unlearning aims at:

  • Maximizing forgetting of ctc_t, such that fθ(x,q)ctf_\theta(x, q) \ne c_t,
  • Maximizing retention of all cC{ct}c' \in \mathcal{C} \setminus \{c_t\}.

Forgetting Rate (FR) and Retention Rate (RR) are defined generically as: FR(ct)=1Accuracyafter(ct)Accuracybefore(ct),RR(c)=Accuracyafter(c)Accuracybefore(c).FR(c_t) = 1 - \frac{Accuracy_{after}(c_t)}{Accuracy_{before}(c_t)},\quad RR(c') = \frac{Accuracy_{after}(c')}{Accuracy_{before}(c')}. VCUBench instantiates six principal metrics:

  • Target Forgetting Accuracy (TFA): Fraction of group-images containing ctc_t where fθf_\theta fails to produce ctc_t.
  • Non-Target Retain Accuracy (NTRA): Fraction of group images without ctc_t where all other present identities are named correctly.
  • Group Retain–Forget F1F_1 (GRF-F₁): Harmonic mean of TFA and NTRA,

GRF-F₁=2TFA×NTRATFA+NTRA\text{GRF-F₁} = \frac{2\,\text{TFA}\,\times\,\text{NTRA}}{\text{TFA} + \text{NTRA}}

  • Efficacy (E): Fraction of single-person images of ctc_t where it is not recognized.
  • Generality: Performance on an unrelated held-out ScienceVQA split.
  • Perplexity (PPL): Masked fluency score for caption outputs, excluding deliberately-forgotten tokens.

3. VCUBench Dataset and Benchmark Construction

VCUBench comprises K=5K = 5 public identities. For each, four disjoint sets are collected:

  • Target-Single: Single-person portraits of ctc_t,
  • Non-Target-Single: Single portraits of other four identities,
  • Target-Group: Group images containing ctc_t,
  • Non-Target-Group: Group scenes missing all five targets.

Label filtering uses an off-the-shelf MLLM (LLaVA-1.5). This yields approximately 15,000 image–question–answer triples. Positive samples correspond to images where cc appears, and negatives where it is absent. VCUBench does not mandate train/val/test splits for unlearning; generality is evaluated on ScienceVQA. All five concepts serve as unlearning targets in round-robin fashion.

4. Evaluation Protocol and Compared Algorithms

The evaluation task is visual question answering: on (image, prompt) pairs, models must identify present identities. Five methods are compared:

  • GA (Gradient Ascent unlearning),
  • PO (Preference Optimization; encourages abstention),
  • GA+KL (GA regularized by KL-divergence for stability),
  • SIU (Strong Individual Unlearning; per-class erasure baseline),
  • AUVIC (adversarial perturbation + dynamic anchor preservation).

Protocols:

  1. For each algorithm and target ctc_t, apply unlearning to fθ0fθ(t)f_{\theta_0} \to f_\theta^{(t)}.
  2. Evaluate metrics (TFA, NTRA, GRF-F₁, Efficacy, Generality, PPL).
  3. Optionally, compute FR/RR for each non-target cc' to analyze collateral forgetting.
Method TFA (%) NTRA (%) GRF-F₁ (%) Efficacy (%) Generality (%) PPL
GA 84.5 30.2 44.5 89.2 63.1 16.4
PO 49.1 54.5 51.7 80.4 62.9 7.58
GA+KL 85.9 26.6 40.6 90.6 63.0 8.92
SIU 92.3 63.5 75.3 100.0 61.2 11.3
AUVIC (Ours) 93.6 83.2 88.1 97.9 63.1 8.14

AUVIC achieves the highest joint retain–forget F1F_1 (88.1%), single-image efficacy (97.9%), and minimizes both collateral damage and fluency loss. On six unlearning targets, AUVIC remains superior across all metrics, evidencing robustness.

5. Algorithmic Approach: Adversarial Unlearning

AUVIC operationalizes fine-grained forgetting by imposing adversarial perturbations onto the parameters, focusing the unlearning on the target concept while leveraging anchor preservation to minimize disruption to related entities. This mechanism strategically decouples the erasure objective from retention constraints, enabling surgical modification of the model’s latent space. Unlike naive methods (e.g., GA, PO) that induce widespread performance collapse or excess abstention, AUVIC sharpens the suppression of only ctc_t, controlling side-effects quantified by NTRA on related identities.

6. Future Directions and Extensions

The VCUBench/AUVIC framework suggests several research frontiers:

  • Expanding concept coverage: Incorporation of non-public identities (private faces), objects, and logos.
  • New computer vision tasks: Instance segmentation for mask erasure, image retrieval for index-level unlearning, bounding box removal.
  • Context and group complexity: Varying group sizes, background diversity to examine context reliance.
  • Security evaluation: Integration of membership-inference attacks to audit extractability post-unlearning.
  • Fairness/demographic balance: Analyzing and constraining disproportionate side effects across protected subgroups.

VCUBench’s multidimensional metric suite and extensible structure provide a principled basis for testing regulatory compliance, algorithmic precision, and the operational boundaries of unlearning in MLLMs.

7. Significance and Implications

AUVIC, in conjunction with VCUBench, represents the first standardized benchmarking environment for targeted visual concept unlearning in single- and multi-entity imagery. Its tripartite assessment—forgetting accuracy, retention stability, and caption fluency—reveals essential algorithmic trade-offs and side-effects. The empirical results confirm that properly constructed adversarial unlearning approaches sharply localize erasure while protecting non-target performance. This innovation advances compliance with privacy mandates and offers a blueprint for ongoing technical and normative evolution in the field of MLLM safety and governance (Chen et al., 14 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AUVIC.