On the Convergence and Calibration of Deep Learning with Differential Privacy (2106.07830v6)
Abstract: Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration. Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more \textit{calibrated}. Our code can be found at \url{https://github.com/woodyx218/opacus_global_clipping}.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.
- A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, pp. 242–252. PMLR, 2019.
- On exact computation with an infinitely wide neural net. arXiv preprint arXiv:1904.11955, 2019a.
- Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv preprint arXiv:1901.08584, 2019b.
- Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems, pp. 15453–15462, 2019.
- Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 464–473. IEEE, 2014.
- Scalable and efficient training of large convolutional neural networks with differential privacy. In Advances in Neural Information Processing Systems, a.
- Differentially private bias-term only fine-tuning of foundation models. In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, b.
- Deep learning with gaussian differential privacy. arXiv preprint arXiv:1911.11607, 2019.
- Fast and memory efficient differentially private-sgd via jl projections. arXiv preprint arXiv:2102.03013, 2021a.
- A dynamical view on optimization algorithms of overparameterized neural networks. In International Conference on Artificial Intelligence and Statistics, pp. 3187–3195. PMLR, 2021b.
- Automatic clipping: Differentially private deep learning made easier and stronger. arXiv preprint arXiv:2206.07136, 2022.
- Revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach. The guardian, 17:22, 2018.
- The discrete gaussian for differential privacy. arXiv preprint arXiv:2004.00010, 2020.
- The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 19), pp. 267–284, 2019.
- Extracting training data from large language models. arXiv preprint arXiv:2012.07805, 2020.
- Understanding gradient clipping in private sgd: A geometric perspective. Advances in Neural Information Processing Systems, 33, 2020.
- Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
- Unique in the crowd: The privacy bounds of human mobility. Scientific reports, 3(1):1–5, 2013.
- Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, 347(6221):536–539, 2015.
- The comparison and evaluation of forecasters. Journal of the Royal Statistical Society: Series D (The Statistician), 32(1-2):12–22, 1983.
- Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
- Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054, 2018.
- Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438. IEEE, 2013.
- Cynthia Dwork. Differential privacy: A survey of results. In International conference on theory and applications of models of computation, pp. 1–19. Springer, 2008.
- Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Springer, 2006.
- The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
- Disparate impact in differential privacy from gradient misalignment. arXiv preprint arXiv:2206.07737, 2022.
- Facebook. Pytorch Privacy library — Opacus. https://github.com/pytorch/opacus.
- Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. arXiv preprint arXiv:2010.15110, 2020.
- Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557, 2017.
- Google. Tensorflow Privacy library. https://github.com/tensorflow/privacy.
- On calibration of modern neural networks. In International Conference on Machine Learning, pp. 1321–1330. PMLR, 2017.
- Practical and private (deep) learning without sampling or shuffling. arXiv preprint arXiv:2103.00039, 2021.
- Computing tight differential privacy guarantees using fft. In International Conference on Artificial Intelligence and Statistics, pp. 2560–2569. PMLR, 2020.
- Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328, 2022.
- Wide neural networks of any depth evolve as linear models under gradient descent. arXiv preprint arXiv:1902.06720, 2019.
- On connecting stochastic gradient mcmc and differential privacy. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 557–566. PMLR, 2019.
- Large language models can be strong differentially private learners. In International Conference on Learning Representations, 2021.
- When does differentially private learning not suffer in high dimensions? Advances in Neural Information Processing Systems, 35:28616–28630, 2022.
- Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963, 2017.
- A general approach to adding differential privacy to iterative training procedures. arXiv preprint arXiv:1812.06210, 2018.
- Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pp. 94–103. IEEE, 2007.
- Large scale transfer learning for differentially private image classification. arXiv preprint arXiv:2205.02973, 2022.
- Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275. IEEE, 2017.
- Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
- Adding gradient noise improves learning for very deep networks. stat, 1050:21, 2015.
- Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, pp. 625–632, 2005.
- Paul Ohm. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA l. Rev., 57:1701, 2009.
- Tempered sigmoid activations for deep learning with differential privacy. arXiv preprint arXiv:2007.14191, 2020.
- Adaptive laplace mechanism: Differential privacy preservation in deep learning. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 385–394. IEEE, 2017.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Estimating the success of re-identifications in incomplete datasets using generative models. Nature communications, 10(1):1–9, 2019.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE, 2017.
- Evading the curse of dimensionality in unconstrained private glms. In International Conference on Artificial Intelligence and Statistics, pp. 2638–2646. PMLR, 2021.
- Analytical composition of differential privacy via the edgeworth accountant. arXiv preprint arXiv:2206.04236, 2022.
- Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning, pp. 2493–2502. PMLR, 2015.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 681–688, 2011.
- A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In International Conference on Learning Representations, 2020.
- A closer look at the calibration of differentially private learners. arXiv preprint arXiv:2210.08248, 2022.
- Differentially private bayesian neural networks on accuracy, privacy and reliability. arXiv preprint arXiv:2107.08461, 2021.
- Gradient descent optimizes over-parameterized deep relu networks. Machine Learning, 109(3):467–492, 2020.
- Zhiqi Bu (42 papers)
- Hua Wang (199 papers)
- Zongyu Dai (5 papers)
- Qi Long (47 papers)