Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails (2405.17529v1)

Published 27 May 2024 in cs.LG and cs.CR

Abstract: Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clipping~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clipping mechanism that applies different clipping thresholds for body and tail gradients to reduce the clipping loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log{\max(0,\theta-1)}(T/\delta)\log{2\theta}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $\theta\geq 1/2$, iterations $T$, and arbitrary probability $\delta$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Deep learning with differential privacy. In SIGSAC, pages 308–318, 2016.
  2. Federated learning and differential privacy for medical image analysis. Scientific reports, 12(1):1953, 2022.
  3. Differentially private learning with adaptive clipping. NeurIPS, 34:17455–17466, 2021.
  4. Sharp concentration results for heavy-tailed distributions. Information and Inference: A Journal of the IMA, 12(3):1655–1685, 2023.
  5. Heavy tails in sgd and compressibility of overparametrized neural networks. NeurIPS, 34:29364–29378, 2021.
  6. Automatic clipping: Differentially private deep learning made easier and stronger. NeurIPS, 36, 2024.
  7. Asymmetric heavy tails and implicit bias in gaussian noise injections. In ICML, pages 1249–1260. PMLR, 2021.
  8. Learning imbalanced datasets with label-distribution-aware margin loss. NeurIPS, 32, 2019.
  9. Understanding gradient clipping in private sgd: A geometric perspective. NeurIPS, 33:13773–13782, 2020.
  10. Differentially private federated learning with local regularization and sparsification. In CVPR, pages 10122–10131, 2022.
  11. Momentum improves normalized sgd. In ICML, pages 2260–2268. PMLR, 2020.
  12. BackPACK: Packing more into backprop. In ICLR, 2020.
  13. High probability guarantees for stochastic convex optimization. In COLT, pages 1411–1427. PMLR, 2020.
  14. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. Ieee, 2009.
  15. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  16. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
  17. Improved convergence of differential private sgd with gradient clipping. In ICLR, 2022.
  18. Uniform convergence of gradients for non-convex learning and optimization. NeurIPS, 31, 2018.
  19. Differentially private diffusion models generate useful synthetic images. arXiv preprint arXiv:2302.13861, 2023.
  20. Mixed differential privacy in computer vision. In CVPR, pages 8376–8386, 2022.
  21. Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. NeurIPS, 33:15042–15053, 2020.
  22. Choosing public datasets for private machine learning via gradient subspace distance. arXiv preprint arXiv:2303.01256, 2023.
  23. The heavy-tail phenomenon in sgd. In ICML, pages 3964–3975. PMLR, 2021.
  24. Pre-trained perceptual features improve differentially private image generation. arXiv preprint arXiv:2205.12900, 2022.
  25. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  26. Privacy-preserving face recognition with learnable privacy budgets in frequency domain. In ECCV, pages 475–491. Springer, 2022.
  27. The composition theorem for differential privacy. In ICML, pages 1376–1385. PMLR, 2015.
  28. Improved rates for differentially private stochastic convex optimization with heavy-tailed data. In ICML, pages 10633–10660. PMLR, 2022.
  29. Revisiting gradient clipping: Stochastic bias and tight convergence guarantees. In ICML, pages 17343–17363. PMLR, 2023.
  30. High probability guarantees for nonconvex stochastic gradient descent with heavy tails. In ICML, pages 12931–12963. PMLR, 2022.
  31. High probability analysis for non-convex stochastic optimization with clipping. arXiv preprint arXiv:2307.13680, 2023.
  32. A high probability analysis of adaptive sgd with momentum. arXiv preprint arXiv:2007.14294, 2020.
  33. Think locally, act globally: Federated learning with local and global representations. arXiv preprint arXiv:2001.01523, 2020.
  34. Privaterec: Differentially private model training and online serving for federated news recommendation. In SIGKDD, pages 4539–4548, 2023.
  35. Private stochastic optimization with large worst-case lipschitz parameter: Optimal rates for (non-smooth) convex losses and extension to non-convex losses. In International Conference on Algorithmic Learning Theory, pages 986–1054. PMLR, 2023.
  36. High-probability convergence bounds for non-convex stochastic gradient descent. arXiv preprint arXiv:2006.05610, 2020.
  37. Improving federated learning face recognition via privacy-agnostic clusters. In ICLR, 2021.
  38. Ilya Mironov. Rényi differential privacy. In CSF, pages 263–275. IEEE, 2017.
  39. Non-gaussianity of stochastic gradient noise. arXiv preprint arXiv:1910.09626, 2019.
  40. Influence-balanced loss for imbalanced visual classification. In ICCV, pages 735–744, 2021.
  41. Adaclip: Adaptive clipping for private sgd. arXiv preprint arXiv:1908.07643, 2019.
  42. Pcdp-sgd: Improving the convergence of differentially private sgd via projection in advance. arXiv preprint arXiv:2312.03792, 2023.
  43. A tail-index analysis of stochastic gradient noise in deep neural networks. In ICML, pages 5827–5837. PMLR, 2019.
  44. Fractional underdamped langevin dynamics: Retargeting sgd with momentum under heavy-tailed gradient noise. In International conference on machine learning, pages 8970–8980. PMLR, 2020.
  45. Sketching for first order method: Efficient algorithm for low-bandwidth channel and vulnerability. arXiv preprint arXiv:2210.08371, 2022.
  46. Differentially private image classification by learning priors from random processes. NeurIPS, 36, 2024.
  47. Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660, 2020.
  48. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  49. Sub-weibull distributions: Generalizing sub-gaussian and sub-exponential properties to heavier tailed distributions. Stat, 9(1):e318, 2020.
  50. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  51. On differentially private stochastic convex optimization with heavy-tailed data. In ICML, pages 10081–10091. PMLR, 2020.
  52. Convergence rates of stochastic gradient descent under infinite noise variance. NeurIPS, 34:18866–18877, 2021.
  53. Dpis: An enhanced mechanism for differentially private sgd with importance sampling. In SIGSAC, pages 2885–2899, 2022.
  54. Differentially private learning with per-sample adaptive clipping. In AAAI, volume 37, pages 10444–10452, 2023.
  55. A theory to instruct differentially-private learning via clipping bias reduction. In SP, pages 2170–2189. IEEE, 2023.
  56. Aggregated residual transformations for deep neural networks. In CVPR, pages 1492–1500, 2017.
  57. Normalized/clipped sgd with perturbation for differentially private non-convex optimization. arXiv preprint arXiv:2206.13033, 2022.
  58. Efficient-fedrec: Efficient federated learning framework for privacy-preserving news recommendation. arXiv preprint arXiv:2109.05446, 2021.
  59. Do not let privacy overbill utility: Gradient embedding perturbation for private learning. arXiv preprint arXiv:2102.12677, 2021.
  60. Large scale private learning via low-rank reparametrization. In ICML, pages 12208–12218. PMLR, 2021.
  61. Why gradient clipping accelerates training: A theoretical justification for adaptivity. ICLR, 2020.
  62. Why are adaptive methods good for attention models? NeurIPS, 33:15383–15393, 2020.
  63. Tong Zhang. Data dependent concentration bounds for sequential prediction algorithms. In COLT, pages 173–187. Springer, 2005.
  64. Differentially private sgd without clipping bias: An error-feedback approach. In ICLR, 2023.
  65. Understanding clipping for federated learning: Convergence and client-level differential privacy. In ICML, 2022.
  66. Bypassing the ambient dimension: Private sgd with gradient subspace identification. In ICLR, 2021.
  67. Improving differentially private sgd via randomly sparsified gradients. Transactions on Machine Learning Research, 2023.
  68. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from minima and regularization effects. 2018.
  69. Medical imaging deep learning with differential privacy. Scientific Reports, 11(1):13524, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com