Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations (2403.02051v1)

Published 4 Mar 2024 in stat.ML, cs.CR, cs.LG, math.ST, and stat.TH

Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $\alpha$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(\epsilon, \delta)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.

References (54)

Summary

The paper establishes (0,δ)-DP guarantees for noisy SGD with heavy-tailed (α-stable) noise without requiring gradient clipping.
It employs Lyapunov functions and Markov process theory to derive time-uniform privacy bounds that hold for a broad class of loss functions.
The findings indicate that heavy-tailed noise can retain differential privacy while potentially improving empirical utility relative to Gaussian noise.

Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

Introduction

Stochastic Gradient Descent (SGD) is a cornerstone of machine learning, widely used for optimizing various loss functions. The introduction of noise into SGD iterations, particularly of a heavy-tailed nature, has seen increasing interest due to its potential benefits in enhancing data privacy and learning efficiency. This paper presents a rigorous analysis of the differential privacy (DP) guarantees provided by noisy SGD, specifically when the added noise follows an α-stable distribution.

Main Contributions

The paper's primary contribution is the establishment of (0,δ)-DP guarantees for noisy SGD with heavy-tailed perturbations, under the (ϵ,δ)-differential privacy framework. This analysis spans a broad class of loss functions, including non-convex ones, and encompasses both heavy-tailed distributions and the Gaussian special case. Key findings include:

DP Guarantees without Bounded Gradients: The analysis reveals that, under specific conditions including pseudo-Lipschitz continuity of gradients and high-probability boundedness of data, noisy SGD can enjoy DP guarantees without the necessity for gradient clipping or bounded sensitivity assumptions.
Time-Uniform Bounds: The derived DP bounds are uniform over time, meaning they do not degrade with an increasing number of iterations.
Applicability to Heavy-Tailed Noise: The exploration extends to α-stable distributions, demonstrating similar privacy guarantees to Gaussian noise, largely unaffected by the heaviness of the noise tails.

Technical Approach

The analytical approach involves:

Assessing the stability and ergodicity properties of the noisy SGD iterations through a novel technique that relies on the construction of suitable Lyapunov functions.
Employing recent results from Markov process theory to estimate the total variation (TV) distance between the laws of noisy SGD processes with substitution of a single data point, leading to the main DP results.
Extensive analysis on the impact of noise distribution properties, especially the tail behavior, on the DP guarantees of the SGD algorithm.

Implications and Future Directions

The findings suggest that, in certain scenarios, the injection of heavy-tailed noise into SGD can offer comparable privacy guarantees with potentially better empirical utility than Gaussian noise, due to the intrinsic robustness features of heavy-tailed distributions.
The research opens avenues for further investigation into the role of heavy-tailed noise in enhancing the privacy-utility trade-off in machine learning models, particularly in light of the growing demand for stringent data protection measures.
Future work might involve exploring more intricate relationships between noise distribution characteristics, algorithmic stability, and differential privacy, alongside the computational benefits of using heavy-tailed over Gaussian noise.

Conclusion

This paper provides a comprehensive analysis of differential privacy guarantees for noisy stochastic gradient descent under heavy-tailed perturbations, widening the understanding of noise-induced privacy in optimization algorithms. The results pave the way for developing more robust, privacy-preserving machine learning methods leveraging the unique benefits of heavy-tailed distributions.

PDF Markdown

Tweets

https://twitter.com/umutsimsekli/status/1764974147170394618

https://twitter.com/OmarRivasplata/status/1765004152134898174