Papers
Topics
Authors
Recent
2000 character limit reached

S-MLLMUn Bench: Selective Unlearning in MLLMs

Updated 2 December 2025
  • The paper introduces S-MLLMUn Bench, which rigorously evaluates selective unlearning by quantifying both forgetting efficacy and capability retention in MLLMs.
  • It employs a dual-protocol approach by partitioning tasks into 'forget' and 'retain' sets, ensuring selective erasure of privacy-sensitive details without impairing overall visual reasoning.
  • Experimental results show that the SMFA method uniquely preserves over 95% of general capabilities while achieving high forgetting scores, even in challenging medical image contexts.

S-MLLMUn Bench is the first benchmark specifically designed to systematically evaluate the selective unlearning capabilities of multimodal LLMs (MLLMs). It explicitly measures the fundamentally competing objectives of erasing privacy-sensitive knowledge while preserving general visual reasoning and multimodal understanding. Through a rigorously structured protocol, dual-criteria evaluation, and synthetic data generation, S-MLLMUn Bench establishes a comprehensive framework for the advancement of “benign forgetting” in MLLMs (Zeng et al., 25 Nov 2025).

1. Motivation and Foundational Objectives

Multimodal LLMs trained on large-scale image-text corpora routinely memorize privacy-sensitive information, such as synthetic faces, ophthalmic images, and associated textual attributes. Existing machine unlearning methods typically overgeneralize: the same updates that remove private knowledge often degrade unrelated abilities, including scene description and medical image interpretation. S-MLLMUn Bench addresses this problem by introducing the first evaluation protocol that jointly quantifies:

  • Forgetting Efficacy: The depth and precision with which a model erases targeted sensitive knowledge.
  • Capability Retention: The preservation of foundational multimodal reasoning on visual and textual queries orthogonal to the unlearning target.

By enforcing dual evaluation via a tightly coupled forget/retain split, S-MLLMUn Bench constrains the design space toward methods achieving erasure without collateral harm, operationalizing the goal of benign memory forgetting (Zeng et al., 25 Nov 2025).

2. Benchmark Structure and Task Design

S-MLLMUn Bench decomposes the selective unlearning assessment into three complementary tasks:

  • Image Memory (Forget/Retain): Models are queried for private attributes associated with a synthetic face or ophthalmic image. In the “forget” set, models should not output the sensitive information; in the “retain” set, correct recall is required.
  • Text Memory (Forget/Retain): Models are queried for purely textual attributes, such as “fun facts,” again partitioned into forget and retain sets.
  • Image Understanding: On the same images, models are assessed on general visual reasoning (e.g., scene description, feature counting, ophthalmic scan interpretation), which should be insensitive to unlearning operations.

Tasks are structured such that forgetting and image understanding queries share identical visual input, thus measuring erasure at the semantic content level while guarding against trivial overfitting.

3. Dataset Construction and Composition

The S-MLLMUn Bench dataset comprises:

  • Profiles: 1,000 synthetic personal profiles, each with a GAN-generated face (via “thispersondoesnotexist”), a DeepEyeNet ophthalmic scan, and eleven textual attributes (name, age, birthplace, phone, salary, “fun facts”, and others) generated by the Qwen-VL-Plus model.
  • Fine-tuning Set: One question-answer pair per attribute per image (11,000 Q/A examples) simulates a memorization phase for base MLLMs.
  • Unlearning Sets: For each evaluation, 5%, 10%, or 15% of profiles (i.e., 50, 100, or 150) form the “forget” set, with a matched few-shot “retain” set anchoring performance on non-erased content.
  • Evaluation Set: All questions are re-paraphrased by Qwen-VL-Plus to ensure semantic-level evaluation and prevent template overfitting across both forget and retain sets, and for image-understanding tasks.

This dual-split methodology yields approximately 11,000 evaluation queries (forget plus retain) and 1,000 additional queries for general image understanding across both synthetic and medical domains.

4. Evaluation Metrics and Scoring Formulation

S-MLLMUn Bench applies multidimensional evaluation, each metric normalized as specified:

  • ROUGE-L: Quantifies lexical overlap.
  • Fact Score (0–10): Measures semantic correctness evaluated by Qwen-Plus.
  • Meaningful Score (0–10): Captures output fluency/coherence via Qwen-Plus.

Aggregate comparisons rely on formalized measures:

  • Knowledge Removal Score (Kr):

Kr=1M(fθ,Sf)M(fθ,Sf)K_r = 1 - \frac{M(f_{\theta'}, S_f)}{M(f_\theta, S_f)}

where SfS_f is the forget set, fθf_\theta the pre-unlearned model, fθf_{\theta'} the unlearned model, MM denotes ROUGE-L or Fact Score. Kr[0,1]K_r \in [0, 1].

  • Retention Rate (Rg):

Rg=M(fθ,Sr+u)M(fθ,Sr+u)R_g = \frac{M(f_{\theta'}, S_{r+u})}{M(f_\theta, S_{r+u})}

for the union Sr+uS_{r+u} of the retain set and the image-understanding set.

  • Bench Score (Trade-Off):

B(α)=αKr+(1α)Rg,α[0,1]B(\alpha) = \alpha K_r + (1-\alpha) R_g, \quad \alpha \in [0,1]

allows controllable emphasis on forgetting versus retention.

Each metric can be computed using ROUGE-L, Fact Score, or Meaningful Score, yielding a multi-dimensional performance profile for each method.

5. Protocol: Model Setup and Unlearning Methodologies

The evaluation framework operates as follows:

  • Base MLLMs: LLaVA-OneVision-7B and Qwen2.5-VL-7B are fully memorized via fine-tuning on the synthetic dataset (11,000 items).
  • Compared Unlearning Methods:

    • GA Difference: Gradient ascent on forget, descent on retain set.
    • KL-Minimization: Maximizes forget-set loss, minimizes KL divergence on retain-output distribution.
    • IDK Tuning: Fine-tunes with “I don’t know” refusals for the forget set, plus few-shot retain set anchoring.
    • MANU: Modality-aware neuron pruning.
    • SMFA (Sculpted Memory Forgetting Adapter): Fine-tunes on refusal labels for the forget set + few-shot retain; separately fine-tunes on retain alone; combines directional-conflict and relative-magnitude masking to confine parameter changes, then merges via

    M=CR,ρ=ΔWfFΔWaF+ϵM = C \odot R,\quad \rho = \frac{\|\Delta W_f\|_F}{\|\Delta W_a\|_F + \epsilon}

    and ΔWf=ΔWf(1M)\Delta W'_f = \Delta W_f \odot (1-M).

All fine-tuning uses LoRA adapters; the evaluation queries are paraphrased to ensure deep-level forgetting measurement.

6. Experimental Findings and Comparative Analysis

Experiments across all baseline models, unlearning methods, and forget ratios (5%, 10%, 15%) yield critical comparative insights:

Method Forgetting (Kr) Retention (Rg) Image Understanding Degradation Output Coherence (Meaningful Score)
GA Diff., KL High (\sim0.6–0.75) Near zero (1\ll1) Catastrophic Low (<5/10)
MANU, IDK Moderate Substantial drop \leq25% loss Moderate
SMFA High (\sim0.6–0.75) \approx0.95–0.99 <<5% loss (even for medical) High (>>9/10)
  • SMFA uniquely achieves precise knowledge erasure (Kr ≈ 0.6–0.75 drop in ROUGE-L and Fact Score) while preserving 95–99% of general capabilities (Rg), and retains high output fluency. Across the ophthalmic subdomain, only SMFA limits image understanding degradation to under 5%; other methods degrade by 20–40%.
  • Ablation studies confirm that both the directional-conflict and relative-magnitude masking components are required to simultaneously achieve deep forgetting and capability retention; omission leads to overgeneralization or incomplete unlearning.
  • The strength of forgetting can be modulated via the kk hyperparameter, with diminishing marginal impact on retention until excessive forgetting is attempted.

7. Context, Impact, and Future Directions

S-MLLMUn Bench decisively exposes and quantifies the trade-off at the heart of selective multimodal model unlearning. Unlike prior benchmarks that focus on pure erasure or global performance, S-MLLMUn’s dual-structured protocol prevents methodological overfit and ensures that general intelligence is not collateral damage. The introduction of SMFA under this protocol is, to date, the only method approaching truly benign forgetting. A plausible implication is that future work must focus on mechanisms capable of fine-grained parameter sculpting to localize knowledge removal.

Potential directions include expanding beyond synthetic to real-world sensitive data, applying S-MLLMUn protocols to models with significantly larger contextual windows, and exploring adjustment of α\alpha in the trade-off score for application-specific prioritization. S-MLLMUn Bench thereby provides a foundational resource and methodological standard for research on privacy, safety, and compliance in MLLMs (Zeng et al., 25 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to S-MLLMUn Bench.