Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Differentially Private SGD via Randomly Sparsified Gradients (2112.00845v3)

Published 1 Dec 2021 in cs.LG and cs.CR

Abstract: Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy, which requires gradient clipping to bound the maximum norm of individual gradients and additive isotropic Gaussian noise. With analysis of the convergence rate of DP-SGD in a non-convex setting, we identify that randomly sparsifying gradients before clipping and noisification adjusts a trade-off between internal components of the convergence bound and leads to a smaller upper bound when the noise is dominant. Additionally, our theoretical analysis and empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping eliminates the trade-off in the bound. This observation is indicative, as it implies DP-SGD has special inherent room for (even simply random) gradient compression. To verify the observation and utilize it, we propose an efficient and lightweight extension using random sparsification (RS) to strengthen DP-SGD. Experiments with various DP-SGD frameworks show that RS can improve performance. Additionally, the produced sparse gradients of RS exhibit advantages in reducing communication cost and strengthening privacy against reconstruction attacks, which are also key problems in private machine learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016a.
  2. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp.  308–318, 2016b.
  3. Differentially private learning with adaptive clipping. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2021.
  4. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems, 2018.
  5. Hypothesis testing interpretations and renyi differential privacy. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pp.  2496–2506. PMLR, 2020.
  6. Private empirical risk minimization: Efficient algorithms and tight error bounds. Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp.  464–473, 2014.
  7. signSGD: Compressed optimisation for non-convex problems. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  560–569. PMLR, 10–15 Jul 2018.
  8. Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
  9. Differentially private bias-term only fine-tuning of foundation models. arXiv preprint arXiv:2210.00036, 2022.
  10. Understanding gradient clipping in private SGD: A geometric perspective. In Advances in Neural Information Processing Systems, 2020.
  11. Differentially private stochastic coordinate descent. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8):7176–7184, May 2021. doi: 10.1609/aaai.v35i8.16882.
  12. Backpack: Packing more into backprop. In International Conference on Learning Representations, 2020.
  13. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
  14. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9:211–407, 2014.
  15. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp.  1322–1333, 2015.
  16. Inverting gradients - how easy is it to break privacy in federated learning? In Advances in Neural Information Processing Systems, pp. 16937–16947, 2020.
  17. Low-rank gradient approximation for memory-efficient on-device training of deep neural network. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  3017–3021, 2020.
  18. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  19. Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2790–2799. PMLR, 09–15 Jun 2019.
  20. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  21. (nearly) dimension independent private erm with adagrad rates via publicly estimated subspaces. In Proceedings of Thirty Fourth Conference on Learning Theory, volume 134, pp.  2717–2746, 2021.
  22. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  23. Hessian based analysis of sgd for deep nets: Dynamics and generalization. In Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), pp.  190–198, 2020.
  24. Large language models can be strong differentially private learners. In International Conference on Learning Representations, 2022.
  25. Fedsel: Federated SGD under local differential privacy with top-k dimension selection. In Yunmook Nah, Bin Cui, Sang-Won Lee, Jeffrey Xu Yu, Yang-Sae Moon, and Steven Euijong Whang (eds.), Database Systems for Advanced Applications, 2020.
  26. Differentially private coordinate descent for composite empirical risk minimization. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  14948–14978. PMLR, 17–23 Jul 2022.
  27. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp.  1273–1282, 2017.
  28. Ilya Mironov. Rényi differential privacy. 2017 IEEE 30th Computer Security Foundations Symposium (CSF), 2017.
  29. Rényi differential privacy of the sampled gaussian mechanism. arXiv preprint arXiv:1908.10530, 2019.
  30. Opacus. Opacus PyTorch library. Available from opacus.ai.
  31. Scattering networks for hybrid representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  32. Tempered sigmoid activations for deep learning with differential privacy. Proceedings of the AAAI Conference on Artificial Intelligence, pp.  9312–9321, 2021.
  33. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, 2013.
  34. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024–8035. 2019a.
  35. Pytorch: An imperative style, high-performance deep learning library. 32, 2019b.
  36. Adaclip: Adaptive clipping for private sgd. arXiv preprint arXiv:1908.07643, 2019.
  37. Privacy-preserving deep learning. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.  909–910, 2015.
  38. Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy, 2017.
  39. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, pp.  1139–1147, 2013.
  40. Differentially private learning needs better features (or much more data). In International Conference on Learning Representations, 2021.
  41. Powersgd: Practical low-rank gradient compression for distributed optimization. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  42. Di Wang and Jinhui Xu. Differentially private empirical risk minimization with smooth non-convex loss functions: A non-stationary view. Proceedings of the AAAI Conference on Artificial Intelligence, pp.  1182–1189, 2019.
  43. Subsampled renyi differential privacy and analytical moments accountant. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 2019.
  44. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 30th International Conference on Machine Learning, 2011.
  45. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 2019.
  46. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  16337–16346, 2021.
  47. Do not let privacy overbill utility: Gradient embedding perturbation for private learning. In International Conference on Learning Representations, 2021a.
  48. Large scale private learning via low-rank reparametrization. In International Conference on Machine Learning (ICML), 2021b.
  49. Differentially private fine-tuning of language models. In International Conference on Learning Representations, 2022.
  50. Wide network learning with differential privacy. arXiv preprint arXiv:2103.01294, 2021.
  51. Why gradient clipping accelerates training: A theoretical justification for adaptivity. In International Conference on Learning Representations, 2020.
  52. Bypassing the ambient dimension: Private SGD with gradient subspace identification. In International Conference on Learning Representations, 2021.
  53. R-GAP: Recursive gradient attack on privacy. In International Conference on Learning Representations, 2021.
  54. Surrogate model extension (SME): A fast and accurate weight update attack on federated learning. In International Conference on Machine Learning, 2023.
  55. Deep leakage from gradients. In Advances in Neural Information Processing Systems, 2019.
Citations (5)

Summary

We haven't generated a summary for this paper yet.