SMFA: Sculpted Memory Forgetting Adapter

Updated 2 December 2025

The paper presents SMFA's novel method to confine unlearning to specific memory regions using loss-driven adapters and dynamic masks for precise concept erasure.
It leverages domain-specific techniques in text-to-image diffusion and multimodal language models, applying concept-aware losses and LoRA-style adapters for controlled updates.
Benchmark results show SMFA achieves significant forgetting efficiency with high retention of unrelated knowledge, outperforming previous unlearning baselines.

The Sculpted Memory Forgetting Adapter (SMFA) is a specialized unlearning framework for controlling knowledge retention and deletion in large-scale neural models. In text-to-image diffusion models and multimodal LLMs (MLLMs), SMFA enables selective “erasure” of sensitive or undesirable concepts while rigorously preserving unrelated functional capabilities. The framework was independently presented for both diffusion-based image synthesis and MLLMs, with modality-specific instantiations. Both share a central paradigm: confining model updates to memory regions responsible for the targeted concepts by combining loss-driven adapters and precision masking mechanisms (Li et al., 12 Apr 2025, Zeng et al., 25 Nov 2025).

1. Architectural Principles

SMFA is an adapter-based unlearning method, but the adapter’s construction and application are tailored to the model domain.

In T2I diffusion settings, SMFA augments the pre-trained U-Net with a dynamic, sparsity-inducing gradient mask and a composite loss function:

Dynamic Mask ( $M_\text{dyn} \in \{0,1\}^{\dim(\theta)}$ ): Gates U-Net parameters, restricting updates to weights implicated by the current unlearning objective.
Concept-Aware Loss ( $\mathcal{L}_\text{total}$ ): Guides unlearning, aligning forgotten concepts to user-defined superclasses and regularizing via knowledge distillation.

In MLLMs, SMFA operates by fine-tuning a frozen base model ( $W_0$ ) with a Memory Forgetting Adapter ( $\Delta W_f$ ) that is further sculpted using a Retaining Anchor ( $\Delta W_a$ ):

MFA: $\Delta W_f$ is trained to transform all forgotten responses (on $\mathcal{D}_f$ ) to explicit refusals (from a large label bank), while preserving few-shot retained samples ( $\mathcal{D}_r^\text{few}$ ).
Masking: The update $\Delta W_f$ is masked to zero-out weights where the forgetting and retaining anchors conflict in sign and magnitude, thus localizing memory changes.

Both implementations eschew wholesale model retraining in favor of controlled, local adaptation (Li et al., 12 Apr 2025, Zeng et al., 25 Nov 2025).

2. Dynamic Masking and Gradient Control

The SMFA for diffusion models introduces a three-stage dynamic mask pipeline:

Gradient Masking: Each update computes the loss gradient $g^{(t)} = \nabla_\theta \mathcal{L}_\text{total}$ , multiplies it elementwise by the mask $M_\text{dyn}^{(t)}$ , and updates weights only where the mask is active.
Accumulated Gradient Statistics: A running sum of absolute gradients $A^{(t+1)} = A^{(t)} + |g^{(t)}|$ is maintained to inform mask updates.
Scheduled Mask Reallocation: Every $U$ steps, a fraction $\tau(t; r_m, T_\text{end}) = (r_m/2)[1+\cos(\pi t/T_\text{end})]$ of active and inactive mask entries are swapped based on $A$ statistics, enforcing sustained sparsity and adaptability.

In MLLMs, masking is formulated over learned LoRA adapters:

Directional Conflict Mask ( $C_{ij}$ ): $C_{ij}=1$ if $\Delta W_{a,ij}\cdot\Delta W_{f,ij} < 0$ , otherwise 0.
Relative Magnitude Mask ( $R_{ij}$ ): $R_{ij}=1$ if $k\rho|\Delta W_{a,ij}| < |\Delta W_{f,ij}|$ , with $\rho = \|\Delta W_f\|_F/(\|\Delta W_a\|_F+\epsilon)$ .
Final Mask ( $M$ ): $M=C\odot R$ .
Sculpted Adapter: $\Delta W_f'$ is obtained by zeroing out masked entries: $\Delta W_f' = \Delta W_f \odot (1-M)$ .

This ensures updates effect forgetting only in memory regions implicated by the “forget” task, mitigating collateral damage to unrelated knowledge (Li et al., 12 Apr 2025, Zeng et al., 25 Nov 2025).

3. Concept-Aware and Selective Loss Functions

For diffusion models, SMFA employs a three-part objective:

Unlearning Loss ( $\mathcal{L}_\text{unlearn}$ ): Penalizes the U-Net when predictions for the target (to-be-forgotten) concept $C$ do not match those for its mapped superclass $C_s$ :

$\mathcal{L}_\text{unlearn}(\theta) = \mathbb{E}_{(x,c,c_s), t, \epsilon}\left[ \| \hat{\epsilon}_\theta(x_t, c_s, t) - \hat{\epsilon}_\theta(x_t, c, t) \|^2 \right]$

Superclass Alignment ( $\mathcal{L}_\text{align}$ ): Ensures the U-Net’s output for $C_s$ prompts aligns with the true underlying noise, maintaining output semantic structure.
Knowledge-Distillation Regularization ( $\mathcal{L}_\text{reg}$ ): Locks in prior unlearned concepts by aligning current predictions to those of a snapshot teacher on all previously erased $C_s$ prompts.

The total loss is

$\mathcal{L}_\text{total} = \mathcal{L}_\text{unlearn} + \alpha \cdot \mathcal{L}_\text{align} + \beta \cdot \mathcal{L}_\text{reg}$

with typical $\alpha \approx 0.25$ , $\beta \approx 0.25$ .

In MLLMs, SMFA minimizes standard cross-entropy over refusal-labeled forget sets ( $\mathcal{D}_f^{\text{idk}}$ ) and few-shot retained examples ( $\mathcal{D}_r^\text{few}$ ). The regularization is imposed via hard masking of adapter updates rather than as a differentiable term (Li et al., 12 Apr 2025, Zeng et al., 25 Nov 2025).

4. Optimization and Training Protocol

The optimization for multi-concept forgetting is sequential and relies on model snapshots:

Initialization: Pre-trained model parameters are loaded; running statistics ( $A$ ) for the mask are optionally warmed up.
Per-Concept Forgetting:
- For each concept $C$ , copy the current model as the “teacher” for distillation regularization.
- Fine-tune on $(x, c, c_s)$ and $(x, c_s)$ batches, compute all loss terms, update using masked gradients.
- Update mask every $U$ steps by reallocating based on $A$ .
- Discard teacher after concept completion.
Output: Final model reflects all target forgets, with unrelated concepts and functionalities preserved.

In MLLMs, LoRA adapters are learned for both forget and retain anchors, only modifying the adapter weights ( $\Delta W_f, \Delta W_a$ ) with the base model fixed. Masking is applied post-hoc to $\Delta W_f$ using the directional/magnitude scheme, yielding the sculpted adapter. The process is compatible with various model sizes (e.g., Qwen2.5-VL-7B, LLaVA-OneVision-7B) and forget ratios up to 15% of finetuning data. Hyperparameters such as the mask trade-off $k$ (default $k=5$ ) are empirically validated (Zeng et al., 25 Nov 2025).

5. Benchmarking, Evaluation, and Performance

In MLLMs, selective unlearning efficacy is quantified using the S-MLLMUn Bench. Key evaluation axes include:

Forgetting ( $R_f$ , $F_f$ ): ROUGE-L and Fact Score for the forget set, where lower is better.
Retention ( $R_r$ , $F_r$ , $M_r$ ): ROUGE-L, Fact, and Meaningful scores for retained and general understanding sets, where higher is better.

SMFA achieves low $R_f$ (strong erasure) with minimal decreases in $R_r$ and image understanding scores, outperforming GA, KL, MANU, and IDK baselines. Ablation experiments indicate both directional and magnitude-based mask criteria are required for robust trade-off. Increasing mask hyperparameter $k$ deepens forgetting up to a point, after which retention degrades. The forgetting occurs precisely via refusals (“I’m not the right source for that”), not arbitrary collapse. Even with few-shot anchors ( $|\mathcal{D}_r^\text{few}| = |\mathcal{D}_f|$ ), SMFA maintains coverage across memory and vision tasks (Zeng et al., 25 Nov 2025).

For diffusion models, unlearning is measured by output fidelity, forgetting effectiveness, and semantic integrity post-unlearning. SMFA yields improved results over previous unlearning techniques, especially in multi-concept scenarios, due to targeted weight changes and explicit alignment losses preventing drift or collapse (Li et al., 12 Apr 2025).

6. Module Interactions, Limitations, and Practical Guidance

Adapter–Mask Synergy: In both modalities, the “forgetting” adapter direction is reined in by the retention anchor or dynamic mask, which localizes erasure and prevents spillage into unrelated capabilities.
Loss Granularity: The multi-component loss in diffusion settings aligns forgotten concepts ( $C$ ) toward semantically meaningful superclasses ( $C_s$ ), unlike naïve unlearning, which often yields degenerate or semantically empty results.
Regularization and Memory Locking: Knowledge distillation regularization in diffusion and hard-masked updates in MLLMs safeguard previously unlearned concepts against sequential catastrophic re-learning.
Hyperparameter Sensitivity: Forget vs retain balance is tunable (e.g., via $k$ in MLLMs, $\alpha, \beta$ in diffusion). Extreme values can induce over-forgetting or retention failures; practically, moderate values yield best trade-offs.
Implementation Strategies: In MLLMs, LoRA-style adapters inserted in all linear layers attain capacity-efficiency balance. In diffusion, 50% sparsity and periodic mask reallocation are empirically optimal.
Scalability: In both domains, SMFA’s localized adapter paradigm admits efficient post-hoc, modular updates to deployed large-scale models without end-to-end retraining.

7. Context and Relation to Broader Unlearning Research

SMFA addresses long-standing weaknesses in model unlearning: instability, residual memory, over-forgetting, and generation collapse. Compared to optimization-based global unlearning (e.g., full-model retraining, GA Difference, KL Minimization), SMFA confines weight changes spatially and semantically, ensuring targeted unlearning with bounded side effects. The two independently developed SMFA variants establish a modular, scalable template for future selective unlearning work in both generative and multimodal neural architectures (Li et al., 12 Apr 2025, Zeng et al., 25 Nov 2025).