Closed-Form Bounds for DP-SGD against Record-level Inference (2402.14397v1)
Abstract: Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,\delta)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. This paper presents a new approach to evaluate the privacy of machine learning models against specific record-level threats, such as membership and attribute inference, without the indirection through DP. We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds. Our proofs model DP-SGD as an information theoretic channel whose inputs are the secrets that an attacker wants to infer (e.g., membership of a data record) and whose outputs are the intermediate model parameters produced by iterative optimization. We obtain bounds for membership inference that match state-of-the-art techniques, whilst being orders of magnitude faster to compute. Additionally, we present a novel data-dependent bound against attribute inference. Our results provide a direct, interpretable, and practical way to evaluate the privacy of trained models against specific inference threats without sacrificing utility.
- Deep learning with differential privacy. In 23rd ACM SIGSAC Conference on Computer and Communications Security, CCS 2016, pages 308–318. ACM, 2016. doi: 10.1145/2976749.2978318.
- Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning, pages 394–403. PMLR, 2018.
- On the importance of architecture and feature selection in differentially private machine learning. arXiv preprint arXiv:2205.06720, 2022.
- SS Barsov and Vladimir V Ul’yanov. Estimates of the proximity of Gaussian measures. Sov. Math., Dokl, 34:462–466, 1987.
- Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE, 2022.
- Bayes security: A not so average metric. In 2023 2023 IEEE 36th Computer Security Foundations Symposium (CSF) (CSF), pages 159–177. IEEE Computer Society, jul 2023. doi: 10.1109/CSF57540.2023.00011.
- Giovanni Cherubin. Bayes, not naïve: Security bounds on website fingerprinting defenses. Proceedings on Privacy Enhancing Technologies, 2017.
- Giovanni Cherubin. Black-box security: measuring black-box information leakage via machine learning. PhD thesis, Royal Holloway, University of London, 2019.
- F-BLEAU: fast black-box leakage estimation. In 2019 IEEE Symposium on Security and Privacy (SP), pages 835–852. IEEE, 2019.
- Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics, 22(4):495–508, 1980.
- Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
- The total variation distance between high-dimensional Gaussians with the same mean. arXiv preprint arXiv:1810.08693 [math.ST], 2018. doi: 10.48550/ARXIV.1810.08693.
- Gaussian differential privacy, 2019. URL https://arxiv.org/abs/1905.02383.
- Connect the dots: Tighter discrete approximations of privacy loss distributions, 2022a. URL https://arxiv.org/abs/2207.04380.
- Connect the dots: Tighter discrete approximations of privacy loss distributions. arXiv preprint arXiv:2207.04380, 2022b.
- Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 1322–1333. ACM, 2015. doi: 10.1145/2810103.2813677.
- Numerical composition of differential privacy. Advances in Neural Information Processing Systems, 34:11631–11642, 2021.
- Frank R Hampel. The influence curve and its role in robust estimation. Journal of the american statistical association, 69(346):383–393, 1974.
- Approximating the kullback leibler divergence between gaussian mixture models. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, volume 4, pages IV–317. IEEE, 2007.
- Investigating membership inference attacks under data dependencies. CoRR, abs/2010.12112v3, 2021.
- Efficient approximation algorithms for point-set diameter in higher dimensions. Journal of Algorithms and Computation, 51(2):47–61, 2019.
- No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204, 2011.
- Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR, 2017.
- Computing differential privacy guarantees for heterogeneous compositions using fft, 2021. URL https://arxiv.org/abs/2102.12412.
- Tight differential privacy for discrete-valued mechanisms and for the subsampled gaussian mechanism using fft. 2020. doi: 10.48550/ARXIV.2006.07134. URL https://arxiv.org/abs/2006.07134.
- Membership privacy: A unifying framework for privacy definitions. In 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS, page 889–900. ACM, 2013. doi: 10.1145/2508859.2516686.
- Optimal membership inference bounds for adaptive composition of sampled gaussian mechanisms. arXiv preprint arXiv:2204.06106, 2022.
- R\\\backslash\’enyi differential privacy of the sampled gaussian mechanism. arXiv preprint arXiv:1908.10530, 2019.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, S&P, pages 3–18. IEEE, 2017. doi: 10.1109/SP.2017.41.
- Geoffrey Smith. On the foundations of quantitative information flow. In International Conference on Foundations of Software Science and Computational Structures, pages 288–302. Springer, 2009.
- Privacy loss classes: The central limit theorem in differential privacy. Cryptology ePrint Archive, 2018.
- Subsampled rényi differential privacy and analytical moments accountant. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1226–1235. PMLR, 2019.
- Andrew Chi-Chih Yao. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(4):721–736, 1982.
- Overfitting, robustness, and malicious algorithms: A study of potential causes of privacy risk in machine learning. Journal of Computer Security, 28(1):35–70, 2020. doi: 10.3233/JCS-191362.
- Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
- Bayesian estimation of differential privacy. arXiv preprint arXiv:2206.05199, 2022.
- Attribute privacy: Framework and mechanisms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 757–766, 2022.
- Giovanni Cherubin (13 papers)
- Boris Köpf (20 papers)
- Andrew Paverd (33 papers)
- Shruti Tople (28 papers)
- Lukas Wutschitz (13 papers)
- Santiago Zanella-Béguelin (13 papers)