Papers
Topics
Authors
Recent
Search
2000 character limit reached

PNSGD: Projected Noisy SGD Unlearning

Updated 23 June 2026
  • PNSGD is a certified machine unlearning method for convex empirical risk minimization that leverages projected noisy SGD to approximate data removal without full retraining.
  • It uses rigorous mathematical tools—tracking Rényi divergence via infinite Wasserstein distance—to certify that unlearned models closely match retrained-from-scratch distributions.
  • The algorithm efficiently addresses both sequential and batch removal scenarios, significantly reducing computational overhead compared to complete retraining.

Projected Noisy SGD Unlearning (PNSGD) is a certified machine unlearning framework for convex empirical risk minimization problems. PNSGD achieves approximate removal of individual or multiple data points from a trained model’s influence, producing model parameter distributions that are close (in the Rényi sense) to the retraining-from-scratch distribution. Its central algorithmic innovation leverages projected noisy stochastic gradient descent with rigorous complexity guarantees and efficient handling of both sequential and batch unlearning scenarios. PNSGD is the first method to offer approximate unlearning guarantees for convex losses under the projected noisy SGD regime, using infinite Wasserstein distance to track the separation between adjacent learning processes (Chien et al., 2024).

1. Problem Setup and Objective

Given a training set D={d1,...,dn}D = \{d_1, ..., d_n\}, the empirical risk objective is

fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)

where xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}, a closed Euclidean ball. Upon a user removal request (e.g., to comply with "the right to be forgotten"), an adjacent dataset DD' is defined, differing from DD by replacement of some point(s). The unlearning goal is to transform an existing model xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x) into a new model whose distribution is (α,ϵ)(\alpha, \epsilon)-close, in the Rényi sense, to the distribution obtained by retraining from scratch on DD'—all without rerunning SGD from initialization.

2. PNSGD Algorithmic Procedure

The PNSGD procedure relies on a cyclic mini-batch sequence B={B0,...,Bn/b1}B = \{B^0, ..., B^{n/b-1}\} of size bb. The learning process operates as follows:

  • Initialization: Start from fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)0 on fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)1.
  • Iterative Update (epoch fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)2, inner step fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)3):

fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)4

where - fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)5, - fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)6 additive Gaussian noise, - fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)7 is the orthogonal projection onto fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)8.

  • After fD(x)=1ni=1n(x;di)f_D(x) = \frac{1}{n} \sum_{i=1}^n \ell(x; d_i)9 epochs, set xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}0 as the current learned model.

Unlearning is achieved by rerunning the above update on xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}1 for xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}2 epochs, starting from xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}3 (the previously trained model). The mini-batch sequence xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}4 is reused for direct comparability.

3. Theoretical Assumptions and Conditions

PNSGD unlearning operates under the following assumptions:

  • For every data xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}5, xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}6 is
    • xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}7-smooth: xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}8
    • xCR={xRd:x2R}x \in C_R = \{x \in \mathbb{R}^d : \|x\|_2 \leq R\}9-strongly-convex over DD'0
    • DD'1-Lipschitz: DD'2
  • The constraint set DD'3 has nonzero Lebesgue measure.
  • DD'4 has a continuous gradient.

These regularity conditions are essential for ensuring the convergence and contractivity needed to establish unlearning guarantees.

4. Certified Unlearning Guarantees

The guarantee tracks the distributions of both the training process and the unlearning process in terms of Rényi divergence, conditioned on the batch sequence DD'5.

Define DD'6 (for step size DD'7), and the initial DD'8 separation

DD'9

The main guarantee (Theorem 3.2) for any DD0:

DD1

where

  • DD2
  • DD3

In the fully converged training regime (DD4), DD5, so DD6 vanishes and

DD7

It suffices to take

DD8

unlearning epochs, each containing DD9 steps.

The proof binds the Rényi divergence by tracking the infinite Wasserstein (xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)0) distance between process distributions (leveraging contractive noisy iteration and a privacy-amplification-by-iteration lemma).

5. Computational Complexity and Comparison with Retraining

The retraining-from-scratch method requires xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)1 epochs (with xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)2) because its initial xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)3 distance is xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)4. By contrast, PNSGD unlearning starts at the current model and its required xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)5 is reduced by:

  • Roughly a factor of xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)6 (xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)7 for moderate batch size xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)8)
  • An additional xargminxCRfD(x)x \approx \arg\min_{x \in C_R} f_D(x)9 factor in the exponent.

In full-batch mode ((α,ϵ)(\alpha, \epsilon)0), the process recovers PNGD unlearning with update contraction (α,ϵ)(\alpha, \epsilon)1. When (α,ϵ)(\alpha, \epsilon)2, the convergence accelerates exponentially in (α,ϵ)(\alpha, \epsilon)3 as (α,ϵ)(\alpha, \epsilon)4. Empirically, (α,ϵ)(\alpha, \epsilon)5 fewer epochs are needed compared to full-batch. Reported experiments demonstrate that under comparable privacy constraints, PNSGD achieves similar utility using only (α,ϵ)(\alpha, \epsilon)6 (mini-batch) and (α,ϵ)(\alpha, \epsilon)7 (full-batch) of the gradient computations compared to state-of-the-art gradient unlearning baselines (Chien et al., 2024).

6. Sequential and Batch Unlearning Extensions

PNSGD efficiently supports both sequential and batch removal scenarios:

  • Sequential removal of (α,ϵ)(\alpha, \epsilon)8 points: For a sequence of datasets (α,ϵ)(\alpha, \epsilon)9, each differing in one point, and after DD'0 unlearning epochs per step,

DD'1

The same Rényi bound applies per step. The triangular inequality in DD'2 ensures efficiency, and the Rényi order DD'3 does not grow exponentially—sequential unlearning requests remain tractable.

  • Batch removal of DD'4 points: For DD'5 differing in DD'6 points, spread over DD'7 mini-batches DD'8 with DD'9 replacements in B={B0,...,Bn/b1}B = \{B^0, ..., B^{n/b-1}\}0, the initial separation is bounded by

B={B0,...,Bn/b1}B = \{B^0, ..., B^{n/b-1}\}1

The initial separation increases linearly with B={B0,...,Bn/b1}B = \{B^0, ..., B^{n/b-1}\}2; the same converged Rényi bound applies.

No auxiliary "private state" is needed for sequential or batch requests, preserving practical deployability.

7. Significance and Application Context

PNSGD provides certified B={B0,...,Bn/b1}B = \{B^0, ..., B^{n/b-1}\}3-Rényi unlearning for convex problems with guarantees rooted in process-level distributional contractivity and infinite Wasserstein metrics. Its complexity saving is theoretically quantified relative to retraining, and it enables unlearning workflows that address both isolated and multiple-point removal scenarios without algorithmic modifications.

The approach directly addresses legal and regulatory mandates such as data erasure requirements, and is applicable wherever compliance with user-initiated deletion requests is critical and retraining from scratch is computationally prohibitive. Its applicability is bounded by the convexity and regularity assumptions described above. The development of PNSGD establishes a foundational framework for certified, efficient, and extensible machine unlearning in convex empirical risk settings (Chien et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projected Noisy SGD Unlearning (PNSGD).