Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 45 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 206 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning (2505.08138v1)

Published 13 May 2025 in cs.LG, cs.AI, and cs.CR

Abstract: Machine unlearning methods take a model trained on a dataset and a forget set, then attempt to produce a model as if it had only been trained on the examples not in the forget set. We empirically show that an adversary is able to distinguish between a mirror model (a control model produced by retraining without the data to forget) and a model produced by an unlearning method across representative unlearning methods from the literature. We build distinguishing algorithms based on evaluation scores in the literature (i.e. membership inference scores) and Kullback-Leibler divergence. We propose a strong formal definition for machine unlearning called computational unlearning. Computational unlearning is defined as the inability for an adversary to distinguish between a mirror model and a model produced by an unlearning method. If the adversary cannot guess better than random (except with negligible probability), then we say that an unlearning method achieves computational unlearning. Our computational unlearning definition provides theoretical structure to prove unlearning feasibility results. For example, our computational unlearning definition immediately implies that there are no deterministic computational unlearning methods for entropic learning algorithms. We also explore the relationship between differential privacy (DP)-based unlearning methods and computational unlearning, showing that DP-based approaches can satisfy computational unlearning at the cost of an extreme utility collapse. These results demonstrate that current methodology in the literature fundamentally falls short of achieving computational unlearning. We conclude by identifying several open questions for future work.

Collections

Summary

A Framework for Evaluating Machine Unlearning

The paper entitled "Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning" introduces a formal framework for evaluating machine unlearning, termed computational unlearning. The authors aim to address shortcomings in existing unlearning evaluation methods by defining a robust, formal criterion based on indistinguishability between models produced by unlearning processes and control models trained without certain datasets.

Contributions and Methodology

The primary contribution of this work is the introduction of computational unlearning. This is defined as the inability of an adversary to distinguish between a control model (trained without particular data points) and a model subject to unlearning. A formal adversarial setup is described, reminiscent of cryptographic security proofs, in which the unlearning method should pass both white-box and black-box indistinguishability tests.

The unlearning algorithms are empirically evaluated through distinguishing attacks using two main scoring methods: membership inference attacks (MIA) scores and Kullback-Leibler (KL) divergence scores. These methods are employed to determine if models subject to typical unlearning techniques can be differentiated from control models. Notably, several established unlearning methods fail this test, suggesting the need for better mechanisms.

Empirical Results

The empirical evaluation reveals the limitations of current machine unlearning methods. For instance, heuristic unlearning approaches such as selective synaptic dampening (SSD) and certified unlearning methods like differential privacy (DP)-based approaches do not satisfy the computational unlearning framework, as adversaries using the proposed distinguisher algorithms can distinguish between unlearned models and their control counterparts with high success rates.

The influence of forget set sizes and the parameters of unlearning methods on distinguishability is rigorously analyzed. The findings highlight how increasing the size of the forget set exacerbates the distinguishability issue in existing methods. Moreover, the evaluation of certified deep unlearning frameworks with varying noise magnitudes reflects an inherent trade-off between utility and unlearning fidelity.

Theoretical Insights

The theoretical aspects of the paper provide significant contributions to understanding computational unlearning. The paper rigorously demonstrates that deterministic unlearning methods cannot meet the computational unlearning criterion due to the necessity of entropic learning schemes providing randomization. Furthermore, while methods utilizing differential privacy could theoretically meet the requirements of computational unlearning, achieving this would incur a substantial utility collapse, rendering practical applications challenging.

Implications and Future Directions

The implications of these findings are substantial for fields reliant on machine learning models where data privacy and the right to be forgotten are crucial. The authors propose a significant shift towards requiring randomization in unlearning algorithms, challenging existing methodologies that often rely on deterministic processes. Additionally, addressing utility collapse in differentially private models remains an open research area with profound implications for both privacy and model performance.

Future work may focus on exploring feasible practical implementations of computational unlearning, developing unlearning methods that achieve balance between security, utility, and performance. Additionally, the paper suggests examining how unlearning might aid in aligning generative models with societal expectations of privacy.

In summary, this work advances the understanding of machine unlearning by framing it within a criterion of computational indistinguishability, posing both technical challenges and opportunities for the development of more robust unlearning frameworks. It calls for continued exploration into the interplay between privacy guarantees and model performance, potentially shaping future standards and practices in machine learning applications.