Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Equivariant Differentially Private Deep Learning: Why DP-SGD Needs Sparser Models (2301.13104v2)

Published 30 Jan 2023 in cs.CV, cs.CR, and cs.LG

Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) limits the amount of private information deep learning models can memorize during training. This is achieved by clipping and adding noise to the model's gradients, and thus networks with more parameters require proportionally stronger perturbation. As a result, large models have difficulties learning useful information, rendering training with DP-SGD exceedingly difficult on more challenging training tasks. Recent research has focused on combating this challenge through training adaptations such as heavy data augmentation and large batch sizes. However, these techniques further increase the computational overhead of DP-SGD and reduce its practical applicability. In this work, we propose using the principle of sparse model design to solve precisely such complex tasks with fewer parameters, higher accuracy, and in less time, thus serving as a promising direction for DP-SGD. We achieve such sparsity by design by introducing equivariant convolutional networks for model training with Differential Privacy. Using equivariant networks, we show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements. On CIFAR-10, we achieve an increase of up to $9\%$ in accuracy while reducing the computation time by more than $85\%$. Our results are a step towards efficient model architectures that make optimal use of their parameters and bridge the privacy-utility gap between private and non-private deep learning for computer vision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. B. Balle, G. Cherubin, and J. Hayes, “Reconstructing Training Data with Informed Adversaries,” 2022 IEEE Symposium on Security and Privacy (SP), pp. 1138–1156, 2022.
  2. N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. B. Brown, D. X. Song, Ú. Erlingsson, A. Oprea, and C. Raffel, “Extracting Training Data from Large Language Models,” in 30th USENIX Security Symposium (USENIX Security 21).   USENIX Association, Aug. 2021, pp. 2633–2650. [Online]. Available: https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting
  3. C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” Found. Trends Theor. Comput. Sci., vol. 9, pp. 211–407, 2014.
  4. M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep Learning with Differential Privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’16.   New York, NY, USA: Association for Computing Machinery, 2016, p. 308–318. [Online]. Available: https://doi.org/10.1145/2976749.2978318
  5. S. De, L. Berrada, J. Hayes, S. L. Smith, and B. Balle, “Unlocking High-Accuracy Differentially Private Image Classification through Scale,” ArXiv, vol. abs/2204.13650, 2022.
  6. T. Sander, P. Stock, and A. Sablayrolles, “TAN without a Burn: Scaling Laws of DP-SGD,” ArXiv, vol. abs/2210.03403, 2022.
  7. F. Dörmann, O. Frisk, L. N. Andersen, and C. F. Pedersen, “Not All Noise is Accounted Equally: How Differentially Private Learning Benefits from Large Sampling Rates,” 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, 2021.
  8. E. Hoffer, T. Ben-Nun, I. Hubara, N. Giladi, T. Hoefler, and D. Soudry, “Augment Your Batch: Improving Generalization through Instance Repetition,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8126–8135, 2020.
  9. S. Fort, A. Brock, R. Pascanu, S. De, and S. L. Smith, “Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error,” ArXiv, vol. abs/2105.13343, 2021.
  10. B. Polyak and A. B. Juditsky, “Acceleration of Stochastic Approximation by Averaging,” Siam Journal on Control and Optimization, vol. 30, pp. 838–855, 1992.
  11. A. Golatkar, A. Achille, Y.-X. Wang, A. Roth, M. Kearns, and S. Soatto, “Mixed Differential Privacy in Computer Vision,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8366–8376, 2022.
  12. D. Yu, H. Zhang, W. Chen, and T.-Y. Liu, “Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning,” ArXiv, vol. abs/2102.12677, 2021.
  13. F. Tramèr, G. Kamath, and N. Carlini, “Considerations for Differentially Private Learning with Large-Scale Public Pretraining,” ArXiv, vol. abs/2212.06470, 2022.
  14. K. He, R. Girshick, and P. Dollar, “Rethinking ImageNet Pre-Training,” in IEEE/CVF International Conference on Computer Vision, 2019, pp. 4917–4926.
  15. X. Mei, Z. Liu, P. M. Robson, B. Marinelli, M. Huang, A. Doshi, A. Jacobi, C. Cao, K. E. Link, T. Yang, Y. Wang, H. Greenspan, T. Deyer, Z. A. Fayad, and Y. Yang, “RadImageNet: An Open Radiologic Deep Learning Research Dataset for Effective Transfer Learning,” Radiology: Artificial Intelligence, vol. 0, no. ja, p. e210315, 0. [Online]. Available: https://doi.org/10.1148/ryai.210315
  16. T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks,” J. Mach. Learn. Res., vol. 22, pp. 241:1–241:124, 2021.
  17. F. Tramèr and D. Boneh, “Differentially Private Learning Needs Better Features (or Much More Data),” in Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  18. T. Cohen and M. Welling, “Group Equivariant Convolutional Networks,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48.   New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 2990–2999. [Online]. Available: https://proceedings.mlr.press/v48/cohenc16.html
  19. ——, “Steerable CNNs,” in International Conference on Learning Representations, 2017.
  20. I. Mironov, “Rényi Differential Privacy,” 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275, 2017.
  21. I. Mironov, K. Talwar, and L. Zhang, “Rényi Differential Privacy of the Sampled Gaussian Mechanism,” ArXiv, vol. abs/1908.10530, 2019.
  22. T. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” ArXiv, vol. abs/1801.10130, 2018.
  23. M. Weiler, F. A. Hamprecht, and M. Storath, “Learning Steerable Filters for Rotation Equivariant CNNs,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 849–858, 2018.
  24. M. Weiler and G. Cesa, “General E(2)-Equivariant Steerable CNNs,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  25. G. Cesa, L. Lang, and M. Weiler, “A Program to Build E(N)-Equivariant Steerable CNNs,” in ICLR, 2022.
  26. M. Geiger and T. Smidt, “e3nn: Euclidean Neural Networks,” 2022. [Online]. Available: https://arxiv.org/abs/2207.09453
  27. S. Qiao, H. Wang, C. Liu, W. Shen, and A. L. Yuille, “Weight Standardization,” ArXiv, vol. abs/1903.10520, 2019.
  28. G. Chirikjian, A. Kyatkin, and A. Buckingham, “Engineering Applications of Noncommutative Harmonic Analysis: With Emphasis on Rotation and Motion Groups,” ASME. Appl. Mech. Rev., vol. 54, no. 6, p. B97–B98, November 2001.
  29. D. Misra, “Mish: A Self Regularized Non-Monotonic Activation Function,” in BMVC, 2020.
  30. A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU),” ArXiv, vol. abs/1803.08375, 2018.
  31. H. Klause, A. Ziller, D. Rueckert, K. Hammernik, and G. Kaissis, “Differentially Private Training of Residual Networks with Scale Normalisation,” in ICML Theory and Practice of Differential Privacy Workshop, 2022.
  32. D. L. Donoho and M. Elad, “Maximal Sparsity Representation via l1 Minimization,” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 5, pp. 2197–2202, 2003.
  33. J. Zhu and M. B. Blaschko, “Differentially Private SGD with Sparse Gradients,” in arXiv preprint arXiv:2112.00845, 2021.
  34. R. Ito, S. P. Liew, T. Takahashi, Y. Sasaki, and M. Onizuka, “Scaling Private Deep Learning with Low-Rank and Sparse Gradients,” ArXiv, vol. abs/2207.02699, 2022.
  35. P. Chrabaszcz, I. Loshchilov, and F. Hutter, “A Downsampled Variant of ImageNet as an Alternative to the CIFAR Datasets,” ArXiv, vol. abs/1707.08819, 2017.
  36. A. Kurakin, S. Chien, S. Song, R. Geambasu, A. Terzis, and A. Thakurta, “Toward Training at ImageNet Scale with Differential Privacy,” ArXiv, vol. abs/2201.12328, 2022.
  37. M. Knolle, A. Ziller, D. Usynin, R. F. Braren, M. R. Makowski, D. Rueckert, and G. Kaissis, “Differentially Private Training of Neural Networks with Langevin Dynamics for Calibrated Predictive Uncertainty,” ArXiv, vol. abs/2107.04296, 2021.
  38. H. Zhang, X. Li, P. Sen, S. Roukos, and T. Hashimoto, “A Closer Look at the Calibration of Differentially Private Learners,” ArXiv, vol. abs/2210.08248, 2022.
  39. G. W. Brier, “Verification of Forecasts Expressed in Terms of Probability,” Monthly Weather Review, vol. 78, pp. 1–3, 1950.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Florian A. Hölzl (2 papers)
  2. Daniel Rueckert (335 papers)
  3. Georgios Kaissis (79 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.