PNSGD: Projected Noisy SGD Unlearning

Updated 23 June 2026

PNSGD is a certified machine unlearning method for convex empirical risk minimization that leverages projected noisy SGD to approximate data removal without full retraining.
It uses rigorous mathematical tools—tracking Rényi divergence via infinite Wasserstein distance—to certify that unlearned models closely match retrained-from-scratch distributions.
The algorithm efficiently addresses both sequential and batch removal scenarios, significantly reducing computational overhead compared to complete retraining.

Projected Noisy SGD Unlearning (PNSGD) is a certified machine unlearning framework for convex empirical risk minimization problems. PNSGD achieves approximate removal of individual or multiple data points from a trained model’s influence, producing model parameter distributions that are close (in the Rényi sense) to the retraining-from-scratch distribution. Its central algorithmic innovation leverages projected noisy stochastic gradient descent with rigorous complexity guarantees and efficient handling of both sequential and batch unlearning scenarios. PNSGD is the first method to offer approximate unlearning guarantees for convex losses under the projected noisy SGD regime, using infinite Wasserstein distance to track the separation between adjacent learning processes (Chien et al., 2024).

1. Problem Setup and Objective

Given a training set $D = \{d_1, ..., d_n\}$ , the empirical risk objective is

$f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$

where $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ , a closed Euclidean ball. Upon a user removal request (e.g., to comply with "the right to be forgotten"), an adjacent dataset $D'$ is defined, differing from $D$ by replacement of some point(s). The unlearning goal is to transform an existing model $x \approx \arg\min_{x \in C_R} f_D(x)$ into a new model whose distribution is $(\alpha, \epsilon)$ -close, in the Rényi sense, to the distribution obtained by retraining from scratch on $D'$ —all without rerunning SGD from initialization.

2. PNSGD Algorithmic Procedure

The PNSGD procedure relies on a cyclic mini-batch sequence $B = \{B^0, ..., B^{n/b-1}\}$ of size $b$ . The learning process operates as follows:

Initialization: Start from $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 0 on $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 1.
Iterative Update (epoch $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 2, inner step $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 3):

$f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 4

where - $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 5, - $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 6 additive Gaussian noise, - $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 7 is the orthogonal projection onto $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 8.

After $f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)$ 9 epochs, set $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 0 as the current learned model.

Unlearning is achieved by rerunning the above update on $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 1 for $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 2 epochs, starting from $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 3 (the previously trained model). The mini-batch sequence $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 4 is reused for direct comparability.

3. Theoretical Assumptions and Conditions

PNSGD unlearning operates under the following assumptions:

For every data $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 5, $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 6 is
- $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 7-smooth: $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 8
- $x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}$ 9-strongly-convex over $D'$ 0
- $D'$ 1-Lipschitz: $D'$ 2
The constraint set $D'$ 3 has nonzero Lebesgue measure.
$D'$ 4 has a continuous gradient.

These regularity conditions are essential for ensuring the convergence and contractivity needed to establish unlearning guarantees.

4. Certified Unlearning Guarantees

The guarantee tracks the distributions of both the training process and the unlearning process in terms of Rényi divergence, conditioned on the batch sequence $D'$ 5.

Define $D'$ 6 (for step size $D'$ 7), and the initial $D'$ 8 separation

$D'$ 9

The main guarantee (Theorem 3.2) for any $D$ 0:

$D$ 1

where

$D$ 2
$D$ 3

In the fully converged training regime ( $D$ 4), $D$ 5, so $D$ 6 vanishes and

$D$ 7

It suffices to take

$D$ 8

unlearning epochs, each containing $D$ 9 steps.

The proof binds the Rényi divergence by tracking the infinite Wasserstein ( $x \approx \arg\min_{x \in C_R} f_D(x)$ 0) distance between process distributions (leveraging contractive noisy iteration and a privacy-amplification-by-iteration lemma).

5. Computational Complexity and Comparison with Retraining

The retraining-from-scratch method requires $x \approx \arg\min_{x \in C_R} f_D(x)$ 1 epochs (with $x \approx \arg\min_{x \in C_R} f_D(x)$ 2) because its initial $x \approx \arg\min_{x \in C_R} f_D(x)$ 3 distance is $x \approx \arg\min_{x \in C_R} f_D(x)$ 4. By contrast, PNSGD unlearning starts at the current model and its required $x \approx \arg\min_{x \in C_R} f_D(x)$ 5 is reduced by:

Roughly a factor of $x \approx \arg\min_{x \in C_R} f_D(x)$ 6 ( $x \approx \arg\min_{x \in C_R} f_D(x)$ 7 for moderate batch size $x \approx \arg\min_{x \in C_R} f_D(x)$ 8)
An additional $x \approx \arg\min_{x \in C_R} f_D(x)$ 9 factor in the exponent.

In full-batch mode ( $(\alpha, \epsilon)$ 0), the process recovers PNGD unlearning with update contraction $(\alpha, \epsilon)$ 1. When $(\alpha, \epsilon)$ 2, the convergence accelerates exponentially in $(\alpha, \epsilon)$ 3 as $(\alpha, \epsilon)$ 4. Empirically, $(\alpha, \epsilon)$ 5 fewer epochs are needed compared to full-batch. Reported experiments demonstrate that under comparable privacy constraints, PNSGD achieves similar utility using only $(\alpha, \epsilon)$ 6 (mini-batch) and $(\alpha, \epsilon)$ 7 (full-batch) of the gradient computations compared to state-of-the-art gradient unlearning baselines (Chien et al., 2024).

6. Sequential and Batch Unlearning Extensions

PNSGD efficiently supports both sequential and batch removal scenarios:

Sequential removal of $(\alpha, \epsilon)$ 8 points: For a sequence of datasets $(\alpha, \epsilon)$ 9, each differing in one point, and after $D'$ 0 unlearning epochs per step,

$D'$ 1

The same Rényi bound applies per step. The triangular inequality in $D'$ 2 ensures efficiency, and the Rényi order $D'$ 3 does not grow exponentially—sequential unlearning requests remain tractable.

Batch removal of $D'$ 4 points: For $D'$ 5 differing in $D'$ 6 points, spread over $D'$ 7 mini-batches $D'$ 8 with $D'$ 9 replacements in $B = \{B^0, ..., B^{n/b-1}\}$ 0, the initial separation is bounded by

$B = \{B^0, ..., B^{n/b-1}\}$ 1

The initial separation increases linearly with $B = \{B^0, ..., B^{n/b-1}\}$ 2; the same converged Rényi bound applies.

No auxiliary "private state" is needed for sequential or batch requests, preserving practical deployability.

7. Significance and Application Context

PNSGD provides certified $B = \{B^0, ..., B^{n/b-1}\}$ 3-Rényi unlearning for convex problems with guarantees rooted in process-level distributional contractivity and infinite Wasserstein metrics. Its complexity saving is theoretically quantified relative to retraining, and it enables unlearning workflows that address both isolated and multiple-point removal scenarios without algorithmic modifications.

The approach directly addresses legal and regulatory mandates such as data erasure requirements, and is applicable wherever compliance with user-initiated deletion requests is critical and retraining from scratch is computationally prohibitive. Its applicability is bounded by the convexity and regularity assumptions described above. The development of PNSGD establishes a foundational framework for certified, efficient, and extensible machine unlearning in convex empirical risk settings (Chien et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Certified Machine Unlearning via Noisy Stochastic Gradient Descent (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projected Noisy SGD Unlearning (PNSGD).