Enhancing Learning with Label Differential Privacy by Vector Approximation (2405.15150v1)
Abstract: Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes $K$ increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with $K$ components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with $K$. Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
- Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. 2014.
- Collecting telemetry data privately. Advances in Neural Information Processing Systems, 30, 2017.
- Privacy loss in apple’s implementation of differential privacy on macos 10.12. arXiv preprint arXiv:1709.02753, 2017.
- Near, J. Differential privacy at scale: Uber and berkeley collaboration. In Enigma 2018 (Enigma 2018). 2018.
- Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–636. 2009.
- Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222–1230. 2013.
- Trust, identity, privacy, and security considerations for designing a peer data sharing platform between people living with hiv. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2):1–27, 2020.
- Deep learning with label differential privacy. Advances in Neural Information Processing Systems, 34:27131–27145, 2021.
- Antipodes of label differential privacy: Pate and alibi. Advances in Neural Information Processing Systems, 34:6934–6945, 2021.
- Label differential privacy via clustering. In International Conference on Artificial Intelligence and Statistics, pages 7055–7075. PMLR, 2022.
- Machine learning with differentially private labels: Mechanisms and frameworks. Proceedings on Privacy Enhancing Technologies, 2022.
- Geopointgan: Synthetic spatial data with local label differential privacy. CoRR, abs/2205.08886, 2022.
- Warner, S. L. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
- Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 43–54. 2016.
- On the relation between identifiability, differential privacy, and mutual-information privacy. IEEE Transactions on Information Theory, 62(9):5018–5029, 2016.
- Calibration tests in multi-class classification: A unifying framework. Advances in neural information processing systems, 32, 2019.
- Calibrating predictions to decisions: A novel approach to multi-class calibration. Advances in Neural Information Processing Systems, 34:22313–22324, 2021.
- Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. Advances in neural information processing systems, 32, 2019.
- Deep learning model calibration for improving performance in class-imbalanced medical image classification tasks. PloS one, 17(1):e0262838, 2022.
- Regression with label differential privacy. In The Eleventh International Conference on Learning Representations. 2022.
- Optimal unbiased randomizers for regression with label differential privacy. In Thirty-seventh Conference on Neural Information Processing Systems. 2023.
- Does label differential privacy prevent label inference attacks? In International Conference on Artificial Intelligence and Statistics, vol. 206 of Proceedings of Machine Learning Research, pages 4336–4347. 2023.
- Label differential privacy and private training data release. In International Conference on Machine Learning, vol. 202, pages 3233–3251. 2023.
- Private learning with public features. In International Conference on Artificial Intelligence and Statistics, vol. 238, pages 4150–4158. 2024.
- Training differentially private ad prediction models with semi-sensitive features. CoRR, abs/2401.15246, 2024.
- Sample complexity bounds for differentially private learning. In Proceedings of the 24th Annual Conference on Learning Theory, pages 155–186. 2011.
- Discrete distribution estimation under local privacy. In International Conference on Machine Learning, pages 2436–2444. PMLR, 2016.
- On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
- LeCun, Y. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Learning multiple layers of features from tiny images. Tech. rep., University of Toronto, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. 2020.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318. 2016.
- Rates of convergence for nearest neighbor classification. Advances in Neural Information Processing Systems, 27, 2014.
- Rate of convergence of k𝑘kitalic_k-nearest-neighbor classification rule. Journal of Machine Learning Research, 18(227):1–16, 2018.
- Classification in general finite dimensional spaces with the k-nearest neighbor rule. Annals of Statistics, 2016.
- Minimax rate optimal adaptive nearest neighbor classification and regression. IEEE Transactions on Information Theory, 67(5):3155–3182, 2021.
- Local nearest neighbour classification with applications to semi-supervised learning. The Annals of Statistics, 48(3):1789–1814, 2020.
- Efficient classification with adaptive knn. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pages 11007–11014. 2021.
- Foundations of machine learning. MIT press, 2018.
- Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
- Why deep neural networks for function approximation? In International Conference on Learning Representations. 2016.
- Yarotsky, D. Optimal approximation of continuous functions by very deep relu networks. In Conference on learning theory, pages 639–649. PMLR, 2018.