Investigating Membership Inference Attacks under Data Dependencies (2010.12112v4)
Abstract: Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in IEEE Symposium on Security and Privacy (SP), 2017, pp. 3–18.
- S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in IEEE Computer Security Foundations Symposium (CSF), 2018, pp. 268–282.
- K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization,” Journal of Machine Learning Research, vol. 12, no. 3, 2011.
- M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2016, pp. 308–318.
- B. K. Beaulieu-Jones, W. Yuan, S. G. Finlayson, and Z. S. Wu, “Privacy-preserving distributed deep learning for clinical data,” https://arxiv.org/abs/1812.01484, 2018.
- L. Yu, L. Liu, C. Pu, M. E. Gursoy, and S. Truex, “Differentially private model publishing for deep learning,” in IEEE Symposium on Security and Privacy (SP), 2019, pp. 332–349.
- https://github.com/tensorflow/tensorflow.
- https://pytorch.org/.
- N. Li, W. Qardaji, D. Su, Y. Wu, and W. Yang, “Membership privacy: a unifying framework for privacy definitions,” in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2013, pp. 889–900.
- B. Jayaraman and D. Evans, “Evaluating differentially private machine learning in practice,” in USENIX Security Symposium, 2019, pp. 1895–1912.
- B. Jayaraman, L. Wang, K. Knipmeyer, Q. Gu, and D. Evans, “Revisiting membership inference under realistic assumptions,” https://arxiv.org/abs/2005.10881, 2020.
- S. K. Murakonda and R. Shokri, “Ml privacy meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning,” https://arxiv.org/abs/2007.09339, 2020.
- M. M. Kamani, S. Farhang, M. Mahdavi, and J. Z. Wang, “Targeted data-driven regularization for out-of-distribution generalization,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: Association for Computing Machinery, 2020, p. 882–891.
- M. Arjovsky, “Out of distribution generalization in machine learning,” https://arxiv.org/pdf/2103.02667.pdf, 2021.
- B. Recht, R. Roelofs, L. Schmidt, and V. Shankar, “Do ImageNet classifiers generalize to ImageNet?” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 5389–5400.
- G. Mårtensson, D. Ferreira, T. Granberg, L. Cavallin, K. Oppedal, A. Padovani, I. Rektorova, L. Bonanni, M. Pardini, M. G. Kramberger, J.-P. Taylor, J. Hort, J. Snædal, J. Kulisevsky, F. Blanc, A. Antonini, P. Mecocci, B. Vellas, M. Tsolaki, I. Kłoszewska, H. Soininen, S. Lovestone, A. Simmons, D. Aarsland, and E. Westman, “The reliability of a deep learning model in clinical out-of-distribution mri data: A multicohort study,” Medical Image Analysis, vol. 66, p. 101714, 2020.
- V. Nagarajan, A. Andreassen, and B. Neyshabur, “Understanding the failure modes of out-of-distribution generalization,” https://arxiv.org/pdf/2010.15775.pdf, 2021.
- S. Sagawa, A. Raghunathan, P. W. Koh, and P. Liang, “An investigation of why overparameterization exacerbates spurious correlations,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 8346–8356.
- S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang, “Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization,” https://arxiv.org/pdf/1911.08731.pdf, 2020.
- M. C. Tschantz, S. Sen, and A. Datta, “SoK: Differential Privacy as a Causal Property,” in IEEE Symposium on Security and Privacy (SP), 2020, pp. 354–371.
- D. Kifer and A. Machanavajjhala, “No free lunch in data privacy,” in ACM SIGMOD International Conference on Management of Data (MOD), 2011, pp. 193–204.
- ——, “Pufferfish: A framework for mathematical privacy definitions,” ACM Trans. Database Syst., vol. 39, no. 1, Jan. 2014. [Online]. Available: https://doi.org/10.1145/2514689
- C. Liu, S. Chakraborty, and P. Mittal, “Dependence makes you vulnberable: Differential privacy under dependent tuples,” in Network and Distributed System Security Symposium (NDSS), vol. 16, 2016, pp. 21–24.
- M. Nasr, S. Song, A. Thakurta, N. Papernot, and N. Carlini, “Adversary instantiation: Lower bounds for differentially private machine learning,” https://arxiv.org/abs/2101.04535, 2021.
- M. Jagielski, J. Ullman, and A. Oprea, “Auditing differentially private machine learning: How private is private sgd?” https://arxiv.org/abs/2006.07709, 2020.
- K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov, “Property inference attacks on fully connected neural networks using permutation invariant representations,” in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2018, pp. 619–633.
- G. Ateniese, L. V. Mancini, A. Spognardi, A. Villani, D. Vitali, and G. Felici, “Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers,” International Journal of Security and Networks, vol. 10, pp. 137–150, 2015.
- S. Mahloujifar, E. Ghosh, and M. Chase, “Property inference from poisoning,” in 2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 1569–1569.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography Conference (TCC), 2006, pp. 265–284.
- C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014.
- P. Kairouz, S. Oh, and P. Viswanath, “The composition theorem for differential privacy,” IEEE Transactions on Information Theory, vol. 63, no. 6, pp. 4037–4049, 2017.
- Ú. Erlingsson, I. Mironov, A. Raghunathan, and S. Song, “That which we call private,” https://arxiv.org/abs/1908.03566, 2019.
- R. Hall, A. Rinaldo, and L. Wasserman, “Differential privacy for functions and functional data,” Journal of Machine Learning Research, vol. 14, no. Feb, pp. 703–727, 2013.
- B. Kulynych, M. Yaghini, G. Cherubin, M. Veale, and C. Troncoso, “Disparate vulnerability to membership inference attacks,” in Proceedings on Privacy Enhancing Technologies, 2022, pp. 460–480.
- D. Dua and C. Graff, “Uci machine learning repository,” http://archive.ics.uci.edu/ml, 2021.
- R. Kohavi, “Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid,” in KDD, vol. 96, 1996, pp. 202–207.
- J. Angwin, J. Larson, S. Mattu, and L. Kirchner, “Machine bias,” https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, 2016.
- https://www.dshs.texas.gov/THCIC/Hospitals/Download.shtm.
- https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
- P. Cortez and A. Silva, “Using data mining to predict secondary school student performance,” in European Concurrent Engineering and Future Business Technology Conference, 2008.
- https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD).
- https://github.com/tensorflow/privacy.
- https://github.com/bargavj/EvaluatingDPML.
- R. Shokri, G. Theodorakopoulos, and C. Troncoso, “Privacy games along location traces: A game-theoretic framework for optimizing location privacy,” ACM Transactions on Privacy and Security (TOPS), vol. 19, no. 4, pp. 1–31, 2016.
- S. Oya and F. Kerschbaum, “IHOP: Improved statistical query recovery against searchable symmetric encryption through quadratic optimization,” in USENIX Security Symposium, 2022, pp. 2407–2424.
- K. Chatzikokolakis, G. Cherubin, C. Palamidessi, and C. Troncoso, “The bayes security measure,” https://arxiv.org/abs/2011.03396, 2020.
- L. Song and P. Mittal, “Systematic evaluation of privacy risks of machine learning models,” in USENIX Security Symposium, 2021, pp. 2615–2632.
- A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, and H. Jégou, “White-box vs black-box: Bayes optimal strategies for membership inference,” in International Conference on Machine Learning, 2019, pp. 5558–5567.
- Y. Long, L. Wang, D. Bu, V. Bindschaedler, X. Wang, H. Tang, C. A. Gunter, and K. Chen, “A pragmatic approach to membership inferences on machine learning models,” in 2020 IEEE European Symposium on Security and Privacy (EuroS&P), 2020, pp. 521–534.
- L. Watson, C. Guo, G. Cormode, and A. Sablayrolles, “On the importance of difficulty calibration in membership inference attacks,” https://arxiv.org/abs/2111.08440, 2021.
- J. Ye, A. Maddi, S. K. Murakonda, V. Bindschaedler, and R. Shokri, “Enhanced membership inference attacks against machine learning models,” https://arxiv.org/abs/2111.09679, 2021.
- N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer, “Membership inference attacks from first principles,” in 2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 1897–1914.
- M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning,” in IEEE Symposium on Security and Privacy (SP), 2019, pp. 739–753.
- E. Bagdasaryan, O. Poursaeed, and V. Shmatikov, “Differential privacy has disparate impact on model accuracy,” in Advances in Neural Information Processing Systems (NIPS), 2019, pp. 15 479–15 488.
- Z. Xiong, Z. Cai, D. Takabi, and W. Li, “Privacy threat and defense for federated learning with non-i.i.d. data in aiot,” IEEE Transactions on Industrial Informatics, vol. 18, no. 2, pp. 1310–1321, 2022.
- H. Hu, Z. Salcic, L. Sun, G. Dobbie, and X. Zhang, “Source inference attacks in federated learning,” https://arxiv.org/abs/2109.05659, 2021.
- M. Gomrokchi, S. Amin, H. Aboutalebi, A. Wong, and D. Precup, “Where did you learn that from? surprising effectiveness of membership inference attacks against temporally correlated data in deep reinforcement learning,” https://arxiv.org/abs/2109.03975, 2021.
- W. Zhang, S. Tople, and O. Ohrimenko, “Leakage of dataset properties in {{\{{Multi-Party}}\}} machine learning,” in USENIX Security Symposium, 2021, pp. 2687–2704.
- M. Parisot, D. Spagnuelo et al., “Property inference attacks on convolutional neural networks: Influence and implications of target model’s complexity,” in 18th International Conference on Security and Cryptography, SECRYPT 2021, 2021, pp. 715–721.
- Thomas Humphries (10 papers)
- Simon Oya (14 papers)
- Lindsey Tulloch (1 paper)
- Matthew Rafuse (2 papers)
- Ian Goldberg (9 papers)
- Urs Hengartner (6 papers)
- Florian Kerschbaum (50 papers)