Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations (2403.02051v1)

Published 4 Mar 2024 in stat.ML, cs.CR, cs.LG, math.ST, and stat.TH

Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $\alpha$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(\epsilon, \delta)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.
  2. S. Akiyama and T. Suzuki. Excess risk of two-layer ReLU neural networks in teacher-student settings and its superiority to kernel methods. In International Conference on Learning Representations, 2023.
  3. J. Altschuler and K. Talwar. Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. In Advances in Neural Information Processing Systems, volume 35, pages 3788–3800, 2022.
  4. Polynomial time and private learning of unbounded Gaussian Mixture Models. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 1018–1040. PMLR, 2023.
  5. S. Asoodeh and M. Diaz. Privacy loss of noisy stochastic gradient descent might converge even for non-convex losses. arXiv preprint arXiv:2305.09903, 2023.
  6. Three variants of differential privacy: Lossless conversion and applications. IEEE Journal on Selected Areas in Information Theory, 2(1):208–222, 2021.
  7. F. Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. Journal of Machine Learning Research, 15(1):595–627, 2014.
  8. Heavy tails in SGD and compressibility of overparametrized neural networks. In Advances in Neural Information Processing Systems, volume 34, pages 29364–29378. Curran Associates, Inc., 2021.
  9. S. Barsov and V. Ulyanov. Estimates of the proximity of Gaussian measures. Doklady Mathematics, 34:462–466, 01 1987.
  10. Mathematical Statistics: Basic Ideas and Selected Topics, Volumes I-II. Chaptman and Hall/CRC Press, 2015.
  11. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069–1109, 2011.
  12. Approximation of the invariant measure of stable SDEs by an Euler–Maruyama scheme. Stochastic Processes and their Applications, 163:136–167, 2023.
  13. Langevin unlearning: A new perspective of noisy gradient descent for machine unlearning. arXiv preprint arXiv:2401.10371, 2024.
  14. Differential privacy dynamics of Langevin diffusion and noisy gradient descent. In Advances in Neural Information Processing Systems, volume 34, pages 14771–14781, 2021.
  15. P. Cuff and L. Yu. Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 43–54, 2016.
  16. Exact asymptotic formulas for the heat kernels of space and time-fractional equations. Fractional Calculus and Applied Analysis, 22(4):968–989, 2019.
  17. S. S. Dhar and P. Chaudhuri. On the statistical efficiency of robust estimators of multivariate location. Statistical Methodology, 8(2):113–128, 2011.
  18. C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, Automata, Languages and Programming, pages 1–12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
  19. C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  20. Global non-convex optimization with discretized diffusions. In Advances in Neural Information Processing Systems, volume 31, 2018.
  21. Convergence of Langevin Monte Carlo in Chi-squared and Rényi divergence. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151. PMLR, 2022.
  22. A. Ganesh and K. Talwar. Faster differentially private samplers via Rényi divergence analysis of discretized Langevin MCMC. In Advances in Neural Information Processing Systems, volume 33, pages 7222–7233, 2020.
  23. Global convergence of stochastic gradient Hamiltonian Monte Carlo for nonconvex stochastic optimization: Nonasymptotic performance bounds and momentum-based acceleration. Operations Research, 70(5):2931–2947, 2022.
  24. G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023.
  25. Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. In Advances in Neural Information Processing Systems, volume 33, pages 15042–15053, 2020.
  26. Statistical indistinguishability of learning algorithms. In Proceedings of the 40th International Conference on Machine Learning, volume 202. PMLR, 2023.
  27. Differentially private accelerated optimization algorithms. SIAM Journal on Optimization, 32(2):795–821, 2022.
  28. Chaotic regularization and heavy-tailed limits for deterministic gradient descent. In Advances in Neural Information Processing Systems, volume 35, 2022.
  29. Central limit theorem and self-normalized Cramér-type moderate deviation for Euler-Maruyama scheme. Bernoulli, 28(2):937–964, 2022.
  30. M. Matsui and Z. Pawlas. Fractional absolute moments of heavy tailed distributions. Brazilian Journal of Probability and Statistics, 30(2):272–298, 2016.
  31. Stability of Markovian processes I: Criteria for discrete-time chains. Advances in Applied Probability, 24(3):542–574, 1992.
  32. Markov Chains and Stochastic Stability. Communications and Control Engineering Series. Springer-Verlag, London, 1993.
  33. I. Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
  34. A. Mishkin. Interpolation, growth conditions, and stochastic gradient descent. Master’s thesis, University of British Columbia, 2020.
  35. T. Murata and T. Suzuki. Diff2: Differential private optimization via gradient differences for nonconvex distributed learning. In International Conference on Machine Learning. PMLR, 2023.
  36. First exit time analysis of stochastic gradient descent under heavy-tailed gradient noise. In Advances in Neural Information Processing Systems, pages 273–283, 2019.
  37. J. P. Nolan. Multivariate elliptically contoured stable distributions: theory and estimation. Computational Statistics, 28:2067–2089, 2013.
  38. J. P. Nolan. Univariate Stable Distributions: Models for Heavy Tailed Data. Springer, 2020.
  39. Non-convex learning via stochastic gradient Langevin dynamics: A nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017.
  40. Algorithmic stability of heavy-tailed stochastic gradient descent on least squares. In International Conference on Algorithmic Learning Theory, volume 201, pages 1292–1342. PMLR, 2023a.
  41. Algorithmic stability of heavy-tailed SGD with general loss functions. In International Conference on Machine Learning, volume 202, pages 28578–28597. PMLR, 2023b.
  42. D. Rudolf and N. Schweizer. Perturbation theory for Markov chains via Wasserstein distance. Bernoulli, 24(4A):2610–2639, 2018.
  43. Differential privacy guarantees for stochastic gradient Langevin dynamics. arXiv preprint arXiv:2201.11980, 2022.
  44. G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York, 1994.
  45. Hausdorff dimension, heavy tails, and generalization in neural networks. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5138–5151. Curran Associates, Inc., 2020.
  46. Fractional underdamped Langevin dynamics: Retargeting SGD with momentum under heavy-tailed gradient noise. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 8970–8980. PMLR, 2020.
  47. Implicit compressibility of overparametrized neural networks trained with heavy-tailed SGD. arXiv preprint arXiv:2306.08125, 2023.
  48. Differentially private empirical risk minimization revisited: Faster and more general. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 2719–2728, Red Hook, NY, USA, 2017. Curran Associates Inc.
  49. Convergence rates of stochastic gradient descent under infinite noise variance. In Advances in Neural Information Processing Systems, volume 34, 2021.
  50. M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011.
  51. J. Ye and R. Shokri. Differentially private learning needs hidden state (or much faster convergence). In Advances in Neural Information Processing Systems, volume 35, pages 703–715, 2022.
  52. Differentially private model publishing for deep learning. In 2019 IEEE Symposium on Security and Privacy (SP), pages 332–349. IEEE, 2019.
  53. S. Yıldırım and B. Ermiş. Exact MCMC with differentially private moves. Statistics and Computing, 29(5):947–963, 2019.
  54. Uniform-in-time Wasserstein stability bounds for (noisy) stochastic gradient descent. In Advances in Neural Information Processing Systems, 2023.

Summary

  • The paper establishes (0,δ)-DP guarantees for noisy SGD with heavy-tailed (α-stable) noise without requiring gradient clipping.
  • It employs Lyapunov functions and Markov process theory to derive time-uniform privacy bounds that hold for a broad class of loss functions.
  • The findings indicate that heavy-tailed noise can retain differential privacy while potentially improving empirical utility relative to Gaussian noise.

Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

Introduction

Stochastic Gradient Descent (SGD) is a cornerstone of machine learning, widely used for optimizing various loss functions. The introduction of noise into SGD iterations, particularly of a heavy-tailed nature, has seen increasing interest due to its potential benefits in enhancing data privacy and learning efficiency. This paper presents a rigorous analysis of the differential privacy (DP) guarantees provided by noisy SGD, specifically when the added noise follows an α-stable distribution.

Main Contributions

The paper's primary contribution is the establishment of (0,δ)-DP guarantees for noisy SGD with heavy-tailed perturbations, under the (ϵ,δ)-differential privacy framework. This analysis spans a broad class of loss functions, including non-convex ones, and encompasses both heavy-tailed distributions and the Gaussian special case. Key findings include:

  • DP Guarantees without Bounded Gradients: The analysis reveals that, under specific conditions including pseudo-Lipschitz continuity of gradients and high-probability boundedness of data, noisy SGD can enjoy DP guarantees without the necessity for gradient clipping or bounded sensitivity assumptions.
  • Time-Uniform Bounds: The derived DP bounds are uniform over time, meaning they do not degrade with an increasing number of iterations.
  • Applicability to Heavy-Tailed Noise: The exploration extends to α-stable distributions, demonstrating similar privacy guarantees to Gaussian noise, largely unaffected by the heaviness of the noise tails.

Technical Approach

The analytical approach involves:

  • Assessing the stability and ergodicity properties of the noisy SGD iterations through a novel technique that relies on the construction of suitable Lyapunov functions.
  • Employing recent results from Markov process theory to estimate the total variation (TV) distance between the laws of noisy SGD processes with substitution of a single data point, leading to the main DP results.
  • Extensive analysis on the impact of noise distribution properties, especially the tail behavior, on the DP guarantees of the SGD algorithm.

Implications and Future Directions

  • The findings suggest that, in certain scenarios, the injection of heavy-tailed noise into SGD can offer comparable privacy guarantees with potentially better empirical utility than Gaussian noise, due to the intrinsic robustness features of heavy-tailed distributions.
  • The research opens avenues for further investigation into the role of heavy-tailed noise in enhancing the privacy-utility trade-off in machine learning models, particularly in light of the growing demand for stringent data protection measures.
  • Future work might involve exploring more intricate relationships between noise distribution characteristics, algorithmic stability, and differential privacy, alongside the computational benefits of using heavy-tailed over Gaussian noise.

Conclusion

This paper provides a comprehensive analysis of differential privacy guarantees for noisy stochastic gradient descent under heavy-tailed perturbations, widening the understanding of noise-induced privacy in optimization algorithms. The results pave the way for developing more robust, privacy-preserving machine learning methods leveraging the unique benefits of heavy-tailed distributions.