AUVIC: Adversarial Unlearning in MLLMs
- AUVIC is a framework for fine-grained erasure of visual concepts from MLLMs, achieving selective forgetting with minimal collateral damage.
- It uses adversarial perturbation with dynamic anchor preservation to target specific visual concepts without disrupting related entities.
- Evaluated on VCUBench, AUVIC demonstrates high precision through metrics like TFA, NTRA, and GRF-F₁, balancing erasure and retention effectively.
AUVIC, introduced in "AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal LLMs" (Chen et al., 14 Nov 2025), is a framework for fine-grained erasure of visual concepts from Multimodal LLMs (MLLMs). The method enables selective forgetting of specific visual concepts while minimizing degradation on non-target entities, addressing regulatory and ethical requirements such as the GDPR "right to be forgotten." AUVIC is evaluated using VCUBench, the inaugural benchmark designed for precision assessment of targeted visual concept unlearning in both single- and group-photo settings.
1. Motivation and Research Context
MLLMs are increasingly deployed across domains leveraging massive, uncurated image–text corpora. These datasets routinely contain sensitive, private, or copyrighted visual concepts—such as individual faces—raising substantial data privacy concerns. Regulatory frameworks mandate capabilities for post-hoc selective concept erasure without costly full retraining. While text unlearning has matured, the precise removal of one visual concept in a multi-entity context, especially under constraints to avoid collateral forgetting, is an unresolved technical challenge. Existing benchmarks either focus exclusively on text or only on single-object removal and neglect group scenarios and retention assessment for non-target concepts. VCUBench is developed to systematically fill these evaluation gaps.
2. Task Formalization and Metric Definitions
Let denote a set of images and the set of identities (visual concepts). An MLLM is queried with image and textual prompt , generating label or free-form captioning. Given a target concept , unlearning aims at:
- Maximizing forgetting of , such that ,
- Maximizing retention of all .
Forgetting Rate (FR) and Retention Rate (RR) are defined generically as: VCUBench instantiates six principal metrics:
- Target Forgetting Accuracy (TFA): Fraction of group-images containing where fails to produce .
- Non-Target Retain Accuracy (NTRA): Fraction of group images without where all other present identities are named correctly.
- Group Retain–Forget (GRF-F₁): Harmonic mean of TFA and NTRA,
- Efficacy (E): Fraction of single-person images of where it is not recognized.
- Generality: Performance on an unrelated held-out ScienceVQA split.
- Perplexity (PPL): Masked fluency score for caption outputs, excluding deliberately-forgotten tokens.
3. VCUBench Dataset and Benchmark Construction
VCUBench comprises public identities. For each, four disjoint sets are collected:
- Target-Single: Single-person portraits of ,
- Non-Target-Single: Single portraits of other four identities,
- Target-Group: Group images containing ,
- Non-Target-Group: Group scenes missing all five targets.
Label filtering uses an off-the-shelf MLLM (LLaVA-1.5). This yields approximately 15,000 image–question–answer triples. Positive samples correspond to images where appears, and negatives where it is absent. VCUBench does not mandate train/val/test splits for unlearning; generality is evaluated on ScienceVQA. All five concepts serve as unlearning targets in round-robin fashion.
4. Evaluation Protocol and Compared Algorithms
The evaluation task is visual question answering: on (image, prompt) pairs, models must identify present identities. Five methods are compared:
- GA (Gradient Ascent unlearning),
- PO (Preference Optimization; encourages abstention),
- GA+KL (GA regularized by KL-divergence for stability),
- SIU (Strong Individual Unlearning; per-class erasure baseline),
- AUVIC (adversarial perturbation + dynamic anchor preservation).
Protocols:
- For each algorithm and target , apply unlearning to .
- Evaluate metrics (TFA, NTRA, GRF-F₁, Efficacy, Generality, PPL).
- Optionally, compute FR/RR for each non-target to analyze collateral forgetting.
| Method | TFA (%) | NTRA (%) | GRF-F₁ (%) | Efficacy (%) | Generality (%) | PPL |
|---|---|---|---|---|---|---|
| GA | 84.5 | 30.2 | 44.5 | 89.2 | 63.1 | 16.4 |
| PO | 49.1 | 54.5 | 51.7 | 80.4 | 62.9 | 7.58 |
| GA+KL | 85.9 | 26.6 | 40.6 | 90.6 | 63.0 | 8.92 |
| SIU | 92.3 | 63.5 | 75.3 | 100.0 | 61.2 | 11.3 |
| AUVIC (Ours) | 93.6 | 83.2 | 88.1 | 97.9 | 63.1 | 8.14 |
AUVIC achieves the highest joint retain–forget (88.1%), single-image efficacy (97.9%), and minimizes both collateral damage and fluency loss. On six unlearning targets, AUVIC remains superior across all metrics, evidencing robustness.
5. Algorithmic Approach: Adversarial Unlearning
AUVIC operationalizes fine-grained forgetting by imposing adversarial perturbations onto the parameters, focusing the unlearning on the target concept while leveraging anchor preservation to minimize disruption to related entities. This mechanism strategically decouples the erasure objective from retention constraints, enabling surgical modification of the model’s latent space. Unlike naive methods (e.g., GA, PO) that induce widespread performance collapse or excess abstention, AUVIC sharpens the suppression of only , controlling side-effects quantified by NTRA on related identities.
6. Future Directions and Extensions
The VCUBench/AUVIC framework suggests several research frontiers:
- Expanding concept coverage: Incorporation of non-public identities (private faces), objects, and logos.
- New computer vision tasks: Instance segmentation for mask erasure, image retrieval for index-level unlearning, bounding box removal.
- Context and group complexity: Varying group sizes, background diversity to examine context reliance.
- Security evaluation: Integration of membership-inference attacks to audit extractability post-unlearning.
- Fairness/demographic balance: Analyzing and constraining disproportionate side effects across protected subgroups.
VCUBench’s multidimensional metric suite and extensible structure provide a principled basis for testing regulatory compliance, algorithmic precision, and the operational boundaries of unlearning in MLLMs.
7. Significance and Implications
AUVIC, in conjunction with VCUBench, represents the first standardized benchmarking environment for targeted visual concept unlearning in single- and multi-entity imagery. Its tripartite assessment—forgetting accuracy, retention stability, and caption fluency—reveals essential algorithmic trade-offs and side-effects. The empirical results confirm that properly constructed adversarial unlearning approaches sharply localize erasure while protecting non-target performance. This innovation advances compliance with privacy mandates and offers a blueprint for ongoing technical and normative evolution in the field of MLLM safety and governance (Chen et al., 14 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free