Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations (2403.02051v1)
Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $\alpha$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(\epsilon, \delta)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.
- S. Akiyama and T. Suzuki. Excess risk of two-layer ReLU neural networks in teacher-student settings and its superiority to kernel methods. In International Conference on Learning Representations, 2023.
- J. Altschuler and K. Talwar. Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. In Advances in Neural Information Processing Systems, volume 35, pages 3788–3800, 2022.
- Polynomial time and private learning of unbounded Gaussian Mixture Models. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 1018–1040. PMLR, 2023.
- S. Asoodeh and M. Diaz. Privacy loss of noisy stochastic gradient descent might converge even for non-convex losses. arXiv preprint arXiv:2305.09903, 2023.
- Three variants of differential privacy: Lossless conversion and applications. IEEE Journal on Selected Areas in Information Theory, 2(1):208–222, 2021.
- F. Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. Journal of Machine Learning Research, 15(1):595–627, 2014.
- Heavy tails in SGD and compressibility of overparametrized neural networks. In Advances in Neural Information Processing Systems, volume 34, pages 29364–29378. Curran Associates, Inc., 2021.
- S. Barsov and V. Ulyanov. Estimates of the proximity of Gaussian measures. Doklady Mathematics, 34:462–466, 01 1987.
- Mathematical Statistics: Basic Ideas and Selected Topics, Volumes I-II. Chaptman and Hall/CRC Press, 2015.
- Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069–1109, 2011.
- Approximation of the invariant measure of stable SDEs by an Euler–Maruyama scheme. Stochastic Processes and their Applications, 163:136–167, 2023.
- Langevin unlearning: A new perspective of noisy gradient descent for machine unlearning. arXiv preprint arXiv:2401.10371, 2024.
- Differential privacy dynamics of Langevin diffusion and noisy gradient descent. In Advances in Neural Information Processing Systems, volume 34, pages 14771–14781, 2021.
- P. Cuff and L. Yu. Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 43–54, 2016.
- Exact asymptotic formulas for the heat kernels of space and time-fractional equations. Fractional Calculus and Applied Analysis, 22(4):968–989, 2019.
- S. S. Dhar and P. Chaudhuri. On the statistical efficiency of robust estimators of multivariate location. Statistical Methodology, 8(2):113–128, 2011.
- C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, Automata, Languages and Programming, pages 1–12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
- C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Global non-convex optimization with discretized diffusions. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Convergence of Langevin Monte Carlo in Chi-squared and Rényi divergence. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151. PMLR, 2022.
- A. Ganesh and K. Talwar. Faster differentially private samplers via Rényi divergence analysis of discretized Langevin MCMC. In Advances in Neural Information Processing Systems, volume 33, pages 7222–7233, 2020.
- Global convergence of stochastic gradient Hamiltonian Monte Carlo for nonconvex stochastic optimization: Nonasymptotic performance bounds and momentum-based acceleration. Operations Research, 70(5):2931–2947, 2022.
- G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023.
- Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. In Advances in Neural Information Processing Systems, volume 33, pages 15042–15053, 2020.
- Statistical indistinguishability of learning algorithms. In Proceedings of the 40th International Conference on Machine Learning, volume 202. PMLR, 2023.
- Differentially private accelerated optimization algorithms. SIAM Journal on Optimization, 32(2):795–821, 2022.
- Chaotic regularization and heavy-tailed limits for deterministic gradient descent. In Advances in Neural Information Processing Systems, volume 35, 2022.
- Central limit theorem and self-normalized Cramér-type moderate deviation for Euler-Maruyama scheme. Bernoulli, 28(2):937–964, 2022.
- M. Matsui and Z. Pawlas. Fractional absolute moments of heavy tailed distributions. Brazilian Journal of Probability and Statistics, 30(2):272–298, 2016.
- Stability of Markovian processes I: Criteria for discrete-time chains. Advances in Applied Probability, 24(3):542–574, 1992.
- Markov Chains and Stochastic Stability. Communications and Control Engineering Series. Springer-Verlag, London, 1993.
- I. Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
- A. Mishkin. Interpolation, growth conditions, and stochastic gradient descent. Master’s thesis, University of British Columbia, 2020.
- T. Murata and T. Suzuki. Diff2: Differential private optimization via gradient differences for nonconvex distributed learning. In International Conference on Machine Learning. PMLR, 2023.
- First exit time analysis of stochastic gradient descent under heavy-tailed gradient noise. In Advances in Neural Information Processing Systems, pages 273–283, 2019.
- J. P. Nolan. Multivariate elliptically contoured stable distributions: theory and estimation. Computational Statistics, 28:2067–2089, 2013.
- J. P. Nolan. Univariate Stable Distributions: Models for Heavy Tailed Data. Springer, 2020.
- Non-convex learning via stochastic gradient Langevin dynamics: A nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017.
- Algorithmic stability of heavy-tailed stochastic gradient descent on least squares. In International Conference on Algorithmic Learning Theory, volume 201, pages 1292–1342. PMLR, 2023a.
- Algorithmic stability of heavy-tailed SGD with general loss functions. In International Conference on Machine Learning, volume 202, pages 28578–28597. PMLR, 2023b.
- D. Rudolf and N. Schweizer. Perturbation theory for Markov chains via Wasserstein distance. Bernoulli, 24(4A):2610–2639, 2018.
- Differential privacy guarantees for stochastic gradient Langevin dynamics. arXiv preprint arXiv:2201.11980, 2022.
- G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York, 1994.
- Hausdorff dimension, heavy tails, and generalization in neural networks. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5138–5151. Curran Associates, Inc., 2020.
- Fractional underdamped Langevin dynamics: Retargeting SGD with momentum under heavy-tailed gradient noise. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 8970–8980. PMLR, 2020.
- Implicit compressibility of overparametrized neural networks trained with heavy-tailed SGD. arXiv preprint arXiv:2306.08125, 2023.
- Differentially private empirical risk minimization revisited: Faster and more general. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 2719–2728, Red Hook, NY, USA, 2017. Curran Associates Inc.
- Convergence rates of stochastic gradient descent under infinite noise variance. In Advances in Neural Information Processing Systems, volume 34, 2021.
- M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011.
- J. Ye and R. Shokri. Differentially private learning needs hidden state (or much faster convergence). In Advances in Neural Information Processing Systems, volume 35, pages 703–715, 2022.
- Differentially private model publishing for deep learning. In 2019 IEEE Symposium on Security and Privacy (SP), pages 332–349. IEEE, 2019.
- S. Yıldırım and B. Ermiş. Exact MCMC with differentially private moves. Statistics and Computing, 29(5):947–963, 2019.
- Uniform-in-time Wasserstein stability bounds for (noisy) stochastic gradient descent. In Advances in Neural Information Processing Systems, 2023.