Certified Unlearning for Convex Models

Updated 23 March 2026

The paper introduces certified unlearning for convex models by establishing provable guarantees via (ε,δ)- and Rényi divergence definitions.
The methodology uses Projected Noisy SGD (PNSGD) to efficiently update model parameters, ensuring strong utility and privacy after data removal.
Empirical results demonstrate significant computational savings and robust utility-privacy trade-offs compared to full retraining and other unlearning frameworks.

Certified unlearning for convex models refers to algorithmic frameworks that, given a trained model and a request to remove one or more data points, efficiently modify the model such that its output distribution becomes provably (statistically) indistinguishable from a retrained model trained on the retained dataset. In convex settings, these guarantees are formalized using (ε,δ)-unlearning or Rényi-unlearning definitions, often leveraging connections to differential privacy. Certified frameworks ensure the influence of deleted data is nullified while achieving strong utility guarantees and computational savings over full retraining.

1. Formal Definitions and Theoretical Guarantees

Certified unlearning is rigorously defined either via approximate differential privacy ((ε,δ)-unlearning) or Rényi divergence ((α,ε)-Rényi unlearning, RU).

(ε,δ)-Certified Unlearning: For learning algorithm $\mathcal{M}$ and unlearning procedure $\mathcal{U}$ , upon deletion, the output distribution $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ on the updated dataset $\mathcal{D}'$ is $(\epsilon,\delta)$ -indistinguishable from the output of retraining $\nu'=\mathcal{M}(\mathcal{D}')$ ; formally, for all measurable $S$ ,

$\Pr[X\in S] \leq e^\epsilon\,\Pr[Y\in S] + \delta\quad\text{and}\quad \Pr[Y\in S] \leq e^\epsilon\,\Pr[X\in S] + \delta.$

(α,ε)-Rényi-Unlearning (RU): RU considers Rényi divergence $D_\alpha$ , requiring $d_\alpha(\rho,\nu')\leq \epsilon$ for adjacent $\mathcal{U}$ 0.

It is standard that $\mathcal{U}$ 1-RU implies $\mathcal{U}$ 2-unlearning for $\mathcal{U}$ 3.

The key task is calibrating the unlearning update and noise injection such that, after deletion, the model distribution matches retraining up to these divergence bounds, even when the number of deletions or sequential requests grows.

2. Projected Noisy SGD Unlearning (PNSGD) Methodology

The Projected Noisy Stochastic Gradient Descent (PNSGD) approach (Chien et al., 2024) provides a general and scalable mechanism for certified unlearning in convex empirical risk minimization:

Learning procedure: The model is trained by projected noisy SGD over $\mathcal{U}$ 4 epochs using cyclic mini-batches of size $\mathcal{U}$ 5, where each update is of the form

$\mathcal{U}$ 6

with $\mathcal{U}$ 7 being the mini-batch gradient and $\mathcal{U}$ 8.

Unlearning procedure: To unlearn, PNSGD initializes at the pre-trained parameter and reruns the same cyclic noisy SGD on the updated dataset for $\mathcal{U}$ 9 epochs, using the same mini-batch cycle.
Stationarity and distribution coupling: The Markov chain induced by PNSGD admits a unique stationary distribution for fixed mini-batch sequences, and as $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 0, the parameter converges to this stationary law.

The unlearning process is contractive due to strong convexity, and by tracking the infinite Wasserstein distance $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 1 between the learning and unlearning chains, one obtains finite-sample divergence bounds.

3. Certified Utility, Privacy, and Complexity Guarantees

Certified Unlearning Bounds

For $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 2-smooth, $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 3-Lipschitz, $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 4-strongly convex $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 5, and step size $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 6, let $\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 7:

Rényi-unlearning bound [Theorem 3.2 in (Chien et al., 2024)]:

$\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 8

where:

$\rho=\mathcal{U}(\mathcal{M}(\mathcal{D}),\mathcal{D},\mathcal{D}')$ 9

and $\mathcal{D}'$ 0 bounds initial distribution discrepancies.

Stationarity regime: If training is converged ( $\mathcal{D}'$ 1 large), then $\mathcal{D}'$ 2 and

$\mathcal{D}'$ 3

with $\mathcal{D}'$ 4 depending on batch parameters.

Convex-only regime: If $\mathcal{D}'$ 5 (no strong convexity), guarantees degrade to linear decay, but converged models still admit certified unlearning.
Sequential and batch unlearning: The framework extends naturally to multiple sequential deletions (with bounds growing at most linearly in the number of unlearning rounds) and to batch deletions (with bounds dependent on the number of affected batches).

Computational Complexity

Certified PNSGD unlearning achieves significant complexity savings:

Method	Complexity (fraction of retrain)	Gradient cost scaling
Retrain	$\mathcal{D}'$ 6	$\mathcal{D}'$ 7 epochs
PNSGD Unlearn	$\mathcal{D}'$ 8 (full), $\mathcal{D}'$ 9 (mini)	$(\epsilon,\delta)$ 0 epochs
D2D	$(\epsilon,\delta)$ 1	$(\epsilon,\delta)$ 2 steps (increases with $(\epsilon,\delta)$ 3)

For typical settings, PNSGD achieves an order-of-magnitude reduction in gradient computations compared to full retraining or D2D.

Utility–Privacy–Complexity Trade-offs

Smaller noise $(\epsilon,\delta)$ 4 improves utility but requires more unlearning steps.
Smaller batch size $(\epsilon,\delta)$ 5 accelerates convergence (privacy decay) at the cost of potential utility degradation.
Sequential unlearning remains efficient: SGLU performs only $(\epsilon,\delta)$ 6– $(\epsilon,\delta)$ 7 of retraining cost even after many deletions, with certified indistinguishability.

Empirical evaluations on MNIST and CIFAR-10 confirm that for $(\epsilon,\delta)$ 8-unlearning at various $(\epsilon,\delta)$ 9, the method matches or exceeds the utility of baselines while using a fraction of the computational cost.

4. Comparison to Other Certified Unlearning Paradigms

Delete-to-Descent (D2D) [Neel et al.]: Relies on either Hessian-inverse updates or full-batch gradient descent plus Gaussian noise; gradient steps grow with $\nu'=\mathcal{M}(\mathcal{D}')$ 0 and sequential removals.
Langevin Unlearning (PNGD) [Chien et al.]: Convergence and privacy decay depend on the number of deletions, with an exponential rate in the order $\nu'=\mathcal{M}(\mathcal{D}')$ 1 compared to PNSGD's improved scaling and tighter decay rate (by a factor $\nu'=\mathcal{M}(\mathcal{D}')$ 2 for moderate $\nu'=\mathcal{M}(\mathcal{D}')$ 3).

The PNSGD framework thus achieves optimal (contractivity-based) privacy amplification by iteration, higher utility for a fixed privacy target, and a tunable balance among accuracy, privacy, and runtime.

5. Sequential and Batch Unlearning Protocols

Sequential unlearning [Theorem 4.1]: The $\nu'=\mathcal{M}(\mathcal{D}')$ 4-distance of coupled stationary distributions after each removal enables certified bounds for any number of requests, with step-sizes $\nu'=\mathcal{M}(\mathcal{D}')$ 5 chosen to maintain the overall divergence bound.
Batch unlearning [Corollary 4.2]: Batch requests affecting $\nu'=\mathcal{M}(\mathcal{D}')$ 6 distinct batches yield an explicit $\nu'=\mathcal{M}(\mathcal{D}')$ 7 bound after one unlearning epoch, quadratic in the number of affected minibatches.

This ensures practical deployment in streaming/deletion-intensive settings, with provable certification for each operation.

6. Empirical Performance and Practical Considerations

Empirical studies confirm several key properties:

Unlearning cost: One-point unlearning requires only a single epoch ( $\nu'=\mathcal{M}(\mathcal{D}')$ 8), matching or exceeding the utility and accuracy of D2D and PNGD at a fraction of the gradient computations.
Cumulative complexity: Removal of multiple points (e.g., $\nu'=\mathcal{M}(\mathcal{D}')$ 9) only requires a small fraction ( $S$ 0– $S$ 1) of the retraining cost.
Privacy–utility trade-off: Calibration of noise and batch-size parameters determines the rate at which unlearning achieves both privacy and high accuracy.

In this regime, projected noisy SGD-based methods such as PNSGD yield the first certified approximate unlearning guarantees for convex models with efficient support for both batch and sequential data removal, scalable to large problem instances and deletion rates (Chien et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Certified Machine Unlearning via Noisy Stochastic Gradient Descent (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Certified Unlearning for Convex Models.