PNSGD: Projected Noisy SGD Unlearning
- PNSGD is a certified machine unlearning method for convex empirical risk minimization that leverages projected noisy SGD to approximate data removal without full retraining.
- It uses rigorous mathematical tools—tracking Rényi divergence via infinite Wasserstein distance—to certify that unlearned models closely match retrained-from-scratch distributions.
- The algorithm efficiently addresses both sequential and batch removal scenarios, significantly reducing computational overhead compared to complete retraining.
Projected Noisy SGD Unlearning (PNSGD) is a certified machine unlearning framework for convex empirical risk minimization problems. PNSGD achieves approximate removal of individual or multiple data points from a trained model’s influence, producing model parameter distributions that are close (in the Rényi sense) to the retraining-from-scratch distribution. Its central algorithmic innovation leverages projected noisy stochastic gradient descent with rigorous complexity guarantees and efficient handling of both sequential and batch unlearning scenarios. PNSGD is the first method to offer approximate unlearning guarantees for convex losses under the projected noisy SGD regime, using infinite Wasserstein distance to track the separation between adjacent learning processes (Chien et al., 2024).
1. Problem Setup and Objective
Given a training set , the empirical risk objective is
where , a closed Euclidean ball. Upon a user removal request (e.g., to comply with "the right to be forgotten"), an adjacent dataset is defined, differing from by replacement of some point(s). The unlearning goal is to transform an existing model into a new model whose distribution is -close, in the Rényi sense, to the distribution obtained by retraining from scratch on —all without rerunning SGD from initialization.
2. PNSGD Algorithmic Procedure
The PNSGD procedure relies on a cyclic mini-batch sequence of size . The learning process operates as follows:
- Initialization: Start from 0 on 1.
- Iterative Update (epoch 2, inner step 3):
4
where - 5, - 6 additive Gaussian noise, - 7 is the orthogonal projection onto 8.
- After 9 epochs, set 0 as the current learned model.
Unlearning is achieved by rerunning the above update on 1 for 2 epochs, starting from 3 (the previously trained model). The mini-batch sequence 4 is reused for direct comparability.
3. Theoretical Assumptions and Conditions
PNSGD unlearning operates under the following assumptions:
- For every data 5, 6 is
- 7-smooth: 8
- 9-strongly-convex over 0
- 1-Lipschitz: 2
- The constraint set 3 has nonzero Lebesgue measure.
- 4 has a continuous gradient.
These regularity conditions are essential for ensuring the convergence and contractivity needed to establish unlearning guarantees.
4. Certified Unlearning Guarantees
The guarantee tracks the distributions of both the training process and the unlearning process in terms of Rényi divergence, conditioned on the batch sequence 5.
Define 6 (for step size 7), and the initial 8 separation
9
The main guarantee (Theorem 3.2) for any 0:
1
where
- 2
- 3
In the fully converged training regime (4), 5, so 6 vanishes and
7
It suffices to take
8
unlearning epochs, each containing 9 steps.
The proof binds the Rényi divergence by tracking the infinite Wasserstein (0) distance between process distributions (leveraging contractive noisy iteration and a privacy-amplification-by-iteration lemma).
5. Computational Complexity and Comparison with Retraining
The retraining-from-scratch method requires 1 epochs (with 2) because its initial 3 distance is 4. By contrast, PNSGD unlearning starts at the current model and its required 5 is reduced by:
- Roughly a factor of 6 (7 for moderate batch size 8)
- An additional 9 factor in the exponent.
In full-batch mode (0), the process recovers PNGD unlearning with update contraction 1. When 2, the convergence accelerates exponentially in 3 as 4. Empirically, 5 fewer epochs are needed compared to full-batch. Reported experiments demonstrate that under comparable privacy constraints, PNSGD achieves similar utility using only 6 (mini-batch) and 7 (full-batch) of the gradient computations compared to state-of-the-art gradient unlearning baselines (Chien et al., 2024).
6. Sequential and Batch Unlearning Extensions
PNSGD efficiently supports both sequential and batch removal scenarios:
- Sequential removal of 8 points: For a sequence of datasets 9, each differing in one point, and after 0 unlearning epochs per step,
1
The same Rényi bound applies per step. The triangular inequality in 2 ensures efficiency, and the Rényi order 3 does not grow exponentially—sequential unlearning requests remain tractable.
- Batch removal of 4 points: For 5 differing in 6 points, spread over 7 mini-batches 8 with 9 replacements in 0, the initial separation is bounded by
1
The initial separation increases linearly with 2; the same converged Rényi bound applies.
No auxiliary "private state" is needed for sequential or batch requests, preserving practical deployability.
7. Significance and Application Context
PNSGD provides certified 3-Rényi unlearning for convex problems with guarantees rooted in process-level distributional contractivity and infinite Wasserstein metrics. Its complexity saving is theoretically quantified relative to retraining, and it enables unlearning workflows that address both isolated and multiple-point removal scenarios without algorithmic modifications.
The approach directly addresses legal and regulatory mandates such as data erasure requirements, and is applicable wherever compliance with user-initiated deletion requests is critical and retraining from scratch is computationally prohibitive. Its applicability is bounded by the convexity and regularity assumptions described above. The development of PNSGD establishes a foundational framework for certified, efficient, and extensible machine unlearning in convex empirical risk settings (Chien et al., 2024).