(ε,δ)-Certified Unlearning
- Certified unlearning is a privacy guarantee that ensures a model, updated to remove specified data, produces outputs nearly identical to those of a model retrained on the remaining data.
- It employs statistical side information and surrogate datasets when original training data is unavailable, calibrating Gaussian noise based on sensitivity and divergence estimates.
- Recent algorithmic implementations using trust-region and iterative updates achieve tighter bounds and improved utility–privacy trade-offs, even in deep, non-i.i.d. model settings.
A -certified unlearning guarantee asserts that after removing a designated deletion set from a trained model—possibly using only indirect statistical information or a surrogate dataset—the resulting model output distribution is nearly indistinguishable (within multiplicative/additive divergence) from the output distribution of a model retrained from scratch on the same retained data. This guarantee is formalized using a privacy-inspired indistinguishability definition, ensuring verifiable removal of information in settings where original training data may be partially or entirely absent. Recent advances have produced concrete algorithms, generalization bounds, and practical implementations that enable -certified unlearning even for deep networks and source-free regimes.
1. Formal Definition and Core Guarantee
Let be the original training set of size , and a subset of size (the forget set), with the retained set. Let be the model originally trained on , 0 the model retrained from scratch on 1, and 2 a (possibly randomized) unlearning mechanism. The 3-certified unlearning property is defined as follows (see (Basaran et al., 6 Jun 2025), Definition 2.1):
4
with the same bound holding when swapping the two distributions. This ensures that for any statistical test, the advantage in distinguishing unlearning from true retraining does not exceed 5.
The addition of suitably calibrated noise (e.g., Gaussian with variance depending on model sensitivity) is essential for certification (Basaran et al., 6 Jun 2025).
2. Unlearning without Access to Source Data
Traditional certified unlearning approaches rely on full access to the training data to re-run learning or extract side information (e.g., Hessian, gradient statistics). In cases where the source data 6 is no longer available, the framework in (Basaran et al., 6 Jun 2025) enables data removal using a surrogate dataset 7 that approximates the statistical properties of 8. The key steps are:
- Approximating the behavior of the original model via side information (e.g., gradients, Hessians) computed on 9.
- Quantifying the mismatch between the surrogate distribution 0 and the unknown true distribution 1 using the total variation distance 2, which is upper-bounded by a function of the KL-divergence, as per Bretagnolle–Huber.
- Calibrating injected Gaussian noise in proportion to both the model sensitivity and the divergence between 3 and 4: larger distribution shift necessitates larger noise for the same 5 certificate.
Letting 6 denote an upper bound on the model discrepancy (cf. Theorem 4.1 in (Basaran et al., 6 Jun 2025)), the noise variance is set as:
7
Practical implementations approximate 8 via classifier-based estimates of the KL-divergence; theoretical bounds assume the true distance is known, but empirical upper bounds suffice to guarantee privacy, potentially at the cost of excess (but still valid) noise (Basaran et al., 6 Jun 2025).
3. Trust-Region and Distribution-Aware Certified Unlearning
In settings where the deletion set is not identically distributed as the training set—e.g., when deletion requests are biased—distribution shift between 9 and 0 can render one-step Newton-based unlearning sub-optimal or void the tightness of previous guarantees. In such cases (Guo et al., 11 Jan 2026), effective certified unlearning proceeds by:
- Employing iterative Newton-type updates restricted to a trust region, so that each step accurately tracks local geometry despite distribution shift.
- Dynamically updating the trust-region radius to maintain control over local smoothness and contractivity of the loss function.
- Providing sharp, iterative residual bounds capturing the accumulated error from successive local quadratic models.
- Calibrating the Gaussian noise injected at the end proportional to the cumulative bound on the model divergence from retraining.
This approach results in tighter certified bounds under non-i.i.d. deletions, reducing utility drop by a factor of 1–2 relative to i.i.d.-assumption methods and achieving provably better indistinguishability and accuracy across tested datasets (Guo et al., 11 Jan 2026).
4. Algorithmic Realizations and Practical Pseudocode
3-certified unlearning generally involves three phases:
- Estimate Sensitivity or Approximation Error. Compute or bound the distance between the unlearning model and the retrained model: either via analytic sensitivity analysis, model difference upper bounds, or per-instance procedures.
- Calibrate Noise. Set the variance of added Gaussian (or Laplacian) noise proportional to the computed sensitivity bound (cf. 4). Theorem 4.2 in (Basaran et al., 6 Jun 2025) and analogous results in (Zhang et al., 2024) and (Guo et al., 11 Jan 2026) formalize this calibration.
- Output the Noisy Updated Model. Perform an algorithmic update (e.g., Newton, trust-region Newton, influence-style update) with surrogate or side-information, and finally perturb by the calibrated noise.
A high-level pseudocode for the source-free surrogate scenario is found in (Basaran et al., 6 Jun 2025) (see “CertifiedUnlearnSurrogate”).
5. Statistical Distance Estimation and Noise Scaling
The statistical closeness between surrogate and original data distributions is critical. The certified framework quantifies divergence using total-variation and/or KL divergence, leveraging the Bretagnolle–Huber inequality:
5
In practical settings, 6 is estimated by forming two models (original and surrogate-trained), and computing the difference in conditional likelihoods on the surrogate dataset (plus marginal input divergence estimated by, e.g., SGLD samples and variational bounds) (Basaran et al., 6 Jun 2025). The resulting 7 is then used in the sensitivity-to-noise calibration.
6. Empirical Evaluation and Utility–Privacy Trade-offs
Empirical validation in (Basaran et al., 6 Jun 2025) and (Guo et al., 11 Jan 2026) demonstrates that properly calibrated certified unlearning:
- Achieves membership inference attack (MIA) accuracy at chance levels (8), matching the true retrain baseline.
- Yields utility loss (error or test accuracy drop) comparable to retraining or superior to approximate unlearning baselines.
- Automatically increases noise as the surrogate diverges from the true distribution or as the deletion set becomes more biased.
Benchmarks span synthetic (Gaussian) experiments with controlled distribution drift, real-world vision datasets (CIFAR-10, StanfordDogs, Caltech256), and zero-shot/surrogate-only settings (MNIST9USPS and vice versa). Across all cases, the empirical "forget score" closely tracks the theoretical 0 trade-off, confirming the reliability of surrogate-based certified unlearning (Basaran et al., 6 Jun 2025).
7. Extensions, Limitations, and Current Research Directions
Extensions and open directions include:
- Generalization to Nonconvex and Deep Models: While the surrogate framework assumes smooth, (strongly) convex losses, certified unlearning has also been adapted for the highly nonconvex settings relevant to deep neural networks using influence-based, LiSSA-style, and noisy fine-tuning methods (Zhang et al., 2024), though some theoretical assumptions may be relaxed or require approximation.
- Approximate Distance Estimation: In practical, source-free scenarios, TV and KL can only be estimated up to upper bounds, potentially making the certified guarantee more conservative (more noise injected than strictly necessary).
- Tighter Distribution-Shift Handling: Recent research emphasizes adaptivity to the actual distribution shift incurred by non-i.i.d. deletions or surrogate mismatch, as in (Guo et al., 11 Jan 2026), which shows that a naive Newton step can yield vacuous bounds under large shifts.
- Alternative Sensitivity Concepts: Retain sensitivity and per-instance noise calibration (see emerging literature but not discussed in (Basaran et al., 6 Jun 2025)) are related directions for reducing the conservativeness of the certificate.
The 1-certified unlearning framework now offers a spectrum of algorithms—ranging from Newton- and trust-region approaches for source-free and distribution-shifted settings, to stochastic optimization and deep learning-influence variants—enabling verifiable erasure of data contributions in practical, privacy-sensitive model update scenarios (Basaran et al., 6 Jun 2025, Guo et al., 11 Jan 2026).