Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction) (2312.14334v1)

Published 21 Dec 2023 in cs.LG and cs.CR

Abstract: The Adam optimizer is a popular choice in contemporary deep learning, due to its strong empirical performance. However we observe that in privacy sensitive scenarios, the traditional use of Differential Privacy (DP) with the Adam optimizer leads to sub-optimal performance on several tasks. We find that this performance degradation is due to a DP bias in Adam's second moment estimator, introduced by the addition of independent noise in the gradient computation to enforce DP guarantees. This DP bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam. Empirically, DP-AdamBC significantly improves the optimization performance of DP-Adam by up to 3.5% in final accuracy in image, text, and graph node classification tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security.
  2. The DeepMind JAX Ecosystem.
  3. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP).
  4. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients. arXiv:1705.07774.
  5. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 632–642. Lisbon, Portugal: Association for Computational Linguistics.
  6. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP).
  7. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks. In USENIX Security Symposium.
  8. Extracting Training Data from Large Language Models. In USENIX Security Symposium.
  9. Node-level differentially private graph neural networks. arXiv preprint arXiv:2111.15521.
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
  11. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference.
  12. A Simple Convergence Proof of Adam and Adagrad. arXiv:2003.02395.
  13. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv:2005.00687.
  14. Adam: A Method for Stochastic Optimization.
  15. Krizhevsky, A. 2009. Learning multiple layers of features from tiny images. Technical report.
  16. Heavy-tailed Noise Does Not Explain the Gap Between SGD and Adam, but Sign Descent Might. In International Conference on Learning Representations.
  17. Differentially Private Adaptive Optimization with Delayed Preconditioners. arXiv:2212.00309.
  18. Large Language Models Can Be Strong Differentially Private Learners.
  19. Rényi Differential Privacy of the Sampled Gaussian Mechanism.
  20. The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection. arXiv:2111.04906.
  21. Tempered Sigmoid Activations for Deep Learning with Differential Privacy. arXiv:2007.14191.
  22. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
  23. On the Convergence of Adam and Beyond.
  24. DP-Adam: Correcting DP Bias in Adam’s Second Moment Estimation. arXiv:2304.11208.
  25. DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction).
  26. Wainwright, M. J. 2019. High-dimensional statistics: A non-asymptotic viewpoint. Cambridge university press.
  27. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv:1804.07461.
  28. A statistical framework for differential privacy. Journal of the American Statistical Association.
  29. The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv:1705.08292.
  30. Opacus: User-Friendly Differential Privacy Library in PyTorch. arXiv preprint arXiv:2109.12298.
Citations (7)

Summary

We haven't generated a summary for this paper yet.