DP-SGD for non-decomposable objective functions (2310.03104v1)
Abstract: Unsupervised pre-training is a common step in developing computer vision models and LLMs. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity can grow with increasing batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular the commonly used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
- Midjourney. midjourney.com. Accessed: 2023-05-11.
- Deep learning with differential privacy. In Conference on Computer and Communications Security (SIGSAC), 2016.
- Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022.
- Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (ICML), 2018.
- Coberl: Contrastive bert for reinforcement learning. arXiv preprint arXiv:2107.05431, 2021.
- Differentially private empirical risk minimization. Journal of Machine Learning Research, 2011.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), 2020.
- Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33, 2020.
- Learning cross-lingual sentence representations via a multi-task dual-encoder model. arXiv preprint arXiv:1810.12836, 2018.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
- Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, 2006.
- Cert: Contrastive self-supervised learning for language understanding. arXiv preprint arXiv:2005.12766, 2020.
- Connect the dots: Tighter discrete approximations of privacy loss distributions. arXiv preprint arXiv:2207.04380, 2022.
- Declutr: Deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659, 2020.
- Ian Goodfellow. Efficient per-example gradient computations. arXiv preprint arXiv:1510.01799, 2015.
- Quantifying and mitigating privacy risks of contrastive learning. In Conference on Computer and Communications Security (SIGSAC), 2021.
- Pairwise learning with differential privacy guarantees. Conference on Artificial Intelligence (AAAI), 34, 2020.
- Towards sharper utility bounds for differentially private pairwise learning. arXiv preprint arXiv:2105.03033, 2021.
- Revisiting gradient clipping: Stochastic bias and tight convergence guarantees. arXiv preprint arXiv:2305.01588, 2023.
- Lipschitz constant estimation of neural networks via sparse polynomial optimization. In International Conference on Learning Representations (ICLR), 2020.
- Scaling up differentially private deep learning with fast per-example gradient clipping. arXiv preprint arXiv:2009.03106, 2020.
- Dpcl: Contrastive representation learning with differential privacy. International Journal of Intelligent Systems, 2022.
- Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679, 2021.
- Encodermi: Membership inference against pre-trained encoders in contrastive learning. In Conference on Computer and Communications Security (SIGSAC), 2021.
- An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893, 2018.
- Stability and convergence of stochastic gradient clipping: Beyond lipschitz continuity and smoothness. In International Conference on Machine Learning (ICML), 2021.
- Ilya Mironov. Rényi differential privacy. In IEEE computer security foundations symposium (CSF), 2017.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
- Improving language understanding by generative pre-training. 2018.
- Privacy-preserving image template sharing using contrastive learning. Entropy, 2022.
- Efficient per-example gradient computations in convolutional neural networks. arXiv preprint arXiv:1912.06015, 2019.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022.
- Efficiently computing local lipschitz constants of neural networks via bound propagation. Advances in Neural Information Processing Systems, 2022.
- Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 2017.
- Information leakage in embedding models. In Conference on Computer and Communications Security (SIGSAC), 2020.
- LaMDA: Language Models for Dialog Applications. arXiv preprint arXiv:2201.08239, 2022.
- Fedcl: Federated contrastive learning for privacy-preserving recommendation. arXiv preprint arXiv:2204.09850, 2022.
- Clear: Contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466, 2020.
- Learning to generate image embeddings with user-level differential privacy. arXiv preprint arXiv:2211.10844, 2022.
- Differentially private pairwise learning revisited. In IJCAI, 2021.
- Stability and differential privacy of stochastic gradient descent for pairwise learning with non-smooth loss. In Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
- Federated learning with only positive labels. In International Conference on Machine Learning (ICML), 2020.
- Learning spread-out local feature descriptors. In IEEE International Conference on Computer Vision, 2017.
- William Kong (2 papers)
- Andrés Muñoz Medina (19 papers)
- Mónica Ribero (10 papers)