Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning (2405.08920v3)

Published 14 May 2024 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC). Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features.

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

Introduction

Differential Privacy (DP) has become a key component in the world of private deep learning. It provides a way to fine-tune publicly pre-trained models on private data while ensuring that individual data points cannot be identified. However, while DP fine-tuning shows impressive results, it brings along the challenge of managing high-dimensional data in noisy settings.

This paper explores the interplay between Neural Collapse (NC) and Differential Privacy. The authors investigate how the phenomenon of Neural Collapse can aid in achieving near-perfect feature representations, thereby mitigating the dimension dependency problem typically associated with differentially private learning algorithms, specifically, Noisy Gradient Descent (NoisyGD).

Key Concepts

Neural Collapse (NC)

Neural Collapse is a fascinating phenomenon observed in deep neural networks trained for classification tasks. In the late stages of training, data representations in the network's last layer align in a highly organized manner:

  1. Collapse to Simplex ETF: The means of features corresponding to different classes form a simplex equiangular tight frame (ETF).
  2. Within-class Variability Vanishing: Features from the same class become tightly clustered around their mean.
  3. Convergence to Mean: Class means become equidistant and well-separated.

Differential Privacy (DP)

DP offers a framework to ensure that the output of an algorithm doesn't reveal too much information about any individual input data point. NoisyGD adds noise in each gradient update to provide DP guarantees, but this becomes tricky with high-dimensional models.

Main Contributions

Theoretical Insights

  • Dimension-Independent Error Bound: The paper theoretically establishes an error bound indicating that the misclassification error can be independent of the feature space dimension if a specific threshold condition on the feature shift parameter β\beta is met.
  • Feature Shift Parameter: A new parameter β\beta is introduced to quantify the deviation between actual and ideal features. The smaller the β\beta, the better the representation.

Empirical Evaluation

  • Neural Collapse and Robustness: The quality of last-layer features was tested with different pre-trained models, showing that more powerful transformers lead to better feature representations.
  • Dimension Reduction Techniques: Methods like Principal Component Analysis (PCA) are shown to improve DP fine-tuning robustness by reducing the dimensional dependency.

Notable Results

  • Fine-tuning an ImageNet pre-trained Wide-ResNet on CIFAR-10 reaches 95.4% accuracy with DP guarantees, vastly exceeding the 67.0% accuracy when trained from scratch.
  • The introduction of PCA on the last-layer features has empirically demonstrated significant gains in testing accuracy, showing robustness against perturbations.
  • ViT pre-trained models demonstrate smaller feature shift parameters (β0.1\beta \approx 0.1) compared to ResNet-50 (β0.2\beta \approx 0.2), highlighting the influence of model quality on feature representation.

Practical Implications

  1. Enhanced DP Learning: The discovery that strong feature representations can lead to dimension-independent learning errors augments the potential of large, pre-trained models to be the backbone of privacy-preserving ML applications.
  2. Robustness to Perturbations: Identifying that DP fine-tuning is less robust compared to its non-DP counterpart emphasizes the need for more advanced techniques, such as PCA, to ensure reliability in real-world data scenarios.
  3. Practical Strategies for DP Fine-Tuning: The implications for future work include developing more refined methods for feature normalization or dimension reduction that specifically consider the nature of data perturbations.

Speculative Future Developments

The paper opens up several avenues for further research:

  • Exploring Other Dimension Reduction Methods: Investigating additional techniques beyond PCA that could further mitigate the effects of high dimensionality.
  • Adversarial Robustness: Delving deeper into adversarial training methods tailored for DP fine-tuning, as adversarial perturbations pose stricter requirements on β\beta.
  • Extended Neural Collapse Analysis: Applying NC principles to other DP learning setups, such as different neural architectures or additional fine-tuning strategies.

Conclusion

The intersection of Neural Collapse and Differential Privacy introduces a promising approach to overcoming the inherent challenges of high-dimensional data in DP learning. By harnessing strong pre-trained model representations and employing smart feature engineering techniques, it’s possible to achieve more robust and dimension-independent differential privacy guarantees. This paper sheds light on the curious but indeed beneficial behaviors of Neural Collapse in the field of DP fine-tuning, paving the way for more secure and efficient use of AI in privacy-sensitive applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Deep learning with differential privacy. In Weippl, E. R., Katzenbeisser, S., Kruegel, C., Myers, A. C., and Halevi, S., editors, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pages 308–318. ACM.
  2. Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  3. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 6280–6290.
  4. DP-mix: Mixup-based data augmentation for differentially private learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  5. Stability of stochastic gradient descent on nonsmooth convex losses. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  6. Private stochastic convex optimization with optimal rates. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E. B., and Garnett, R., editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 11279–11288.
  7. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 464–473. IEEE Computer Society.
  8. Shifted interpolation for differential privacy. arXiv preprint arXiv:2403.00278.
  9. Deep Learning With Gaussian Differential Privacy. Harvard Data Science Review, 2(3). https://hdsr.mitpress.mit.edu/pub/u24wj42y.
  10. Differentially private bias-term only fine-tuning of foundation models. arXiv preprint arXiv:2210.00036.
  11. An equivalence between private classification and online prediction. In Irani, S., editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 389–402. IEEE.
  12. Differentially private release and learning of threshold functions. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 634–649. IEEE.
  13. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Hirt, M. and Smith, A. D., editors, Theory of Cryptography - 14th International Conference, TCC 2016-B, Beijing, China, October 31 - November 3, 2016, Proceedings, Part I, volume 9985 of Lecture Notes in Computer Science, pages 635–658.
  14. Sample complexity bounds for differentially private learning. In Proceedings of the 24th Annual Conference on Learning Theory, pages 155–186. JMLR Workshop and Conference Proceedings.
  15. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  17. Gaussian differential privacy. J. R. Stat. Soc. Ser. B. Stat. Methodol., 84(1):3–54. With discussions and a reply by the authors.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  19. Dwork, C. (2006). Differential privacy. In Bugliesi, M., Preneel, B., Sassone, V., and Wegener, I., editors, Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II, volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer.
  20. An $\ell_{\infty}$ eigenvector perturbation bound and its application. J. Mach. Learn. Res., 18:207:1–207:42.
  21. Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proceedings of the National Academy of Sciences, 118(43):e2103091118.
  22. Private stochastic convex optimization: optimal rates in linear time. In Makarychev, K., Makarychev, Y., Tulsiani, M., Kamath, G., and Chuzhoy, J., editors, Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 439–449. ACM.
  23. Differentially private diffusion models generate useful synthetic images. arXiv preprint arXiv:2302.13861.
  24. Generative adversarial networks. Commun. ACM, 63(11):139–144.
  25. A law of data separation in deep learning. Proceedings of the National Academy of Sciences, 120(36):e2221704120.
  26. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  27. Adaptive estimation of a quadratic functional by model selection. Ann. Statist., 28(5):1302–1338.
  28. When does differentially private learning not suffer in high dimensions? Advances in Neural Information Processing Systems, 35:28616–28630.
  29. What deep representations should we learn? – a neural collapse perspective. https://openreview.net/forum?id=ZKEhS93FjhR.
  30. Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679.
  31. Statistical theory of differentially private marginal-based data synthesis algorithms. In The Eleventh International Conference on Learning Representations.
  32. Leveraging public data for practical private query release. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 6968–6977. PMLR.
  33. The tunnel effect: Building data representations in deep neural networks. NeurIPS.
  34. Winning the NIST contest: A scalable and general approach to differentially private synthetic data. J. Priv. Confidentiality, 11(3).
  35. Prevalence of neural collapse during the terminal phase of deep learning training. Proc. Natl. Acad. Sci. USA, 117(40):24652–24663.
  36. Improving language understanding by generative pre-training. OpenAI.
  37. Stochastic gradient descent with differentially private updates. In 2013 IEEE global conference on signal and information processing, pages 245–248. IEEE.
  38. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, pages 1139–1147. JMLR.org.
  39. Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660.
  40. Unified enhancement of privacy bounds for mixture mechanisms via f𝑓fitalic_f-differential privacy. In NeurIPS.
  41. Analytical composition of differential privacy via the edgeworth accountant.
  42. Wang, Y. (2018). Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. In Globerson, A. and Silva, R., editors, Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pages 93–103. AUAI Press.
  43. Subsampled Rényi differential privacy and analytical moments accountant. J. Priv. Confidentiality, 10(2).
  44. Learning with differential privacy: stability, learnability and the sufficiency and necessity of ERM principle. J. Mach. Learn. Res., 17:Paper No. 183, 40.
  45. Differentially private learning needs hidden state (or much faster convergence). In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  46. Initialization matters: Privacy-utility analysis of overparameterized neural networks. In NeurIPS.
  47. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500.
  48. Optimal accounting of differential privacy via characteristic function. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I., editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event, volume 151 of Proceedings of Machine Learning Research, pages 4782–4817. PMLR.
  49. Poission subsampled Rényi differential privacy. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 7634–7642. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chendi Wang (8 papers)
  2. Yuqing Zhu (34 papers)
  3. Weijie J. Su (69 papers)
  4. Yu-Xiang Wang (124 papers)
Citations (3)