Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Privacy Effect of Data Enhancement via the Lens of Memorization (2208.08270v4)

Published 17 Aug 2022 in cs.LG, cs.CR, and cs.CV

Abstract: Machine learning poses severe privacy concerns as it has been shown that the learned models can reveal sensitive information about their training data. Many works have investigated the effect of widely adopted data augmentation and adversarial training techniques, termed data enhancement in the paper, on the privacy leakage of machine learning models. Such privacy effects are often measured by membership inference attacks (MIAs), which aim to identify whether a particular example belongs to the training set or not. We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture individual samples' memorization degrees for evaluation. Through extensive experiments, we unveil several findings about the connections between three essential properties of machine learning models, including privacy, generalization gap, and adversarial robustness. We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results. Moreover, there is not necessarily a trade-off between adversarial robustness and privacy as stronger adversarial robustness does not make the model more susceptible to privacy attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. J. Poushter et al., “Smartphone ownership and internet usage continues to climb in emerging economies,” Pew Research Center, vol. 22, pp. 1–44, 2016.
  2. C. Song, T. Ristenpart, and V. Shmatikov, “Machine learning models that remember too much,” in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017, pp. 587–601.
  3. N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in USENIX Security Symposium (USENIX), 2019, pp. 267–284.
  4. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,” Commun. ACM, vol. 64, no. 3, pp. 107–115, 2021.
  5. N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. B. Brown, D. Song, Ú. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” in USENIX Security Symposium (USENIX), 2021, pp. 2633–2650.
  6. R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in IEEE Symposium on Security and Privacy (S&P), 2017, pp. 3–18.
  7. A. H. Poorjam, Y. P. Raykov, R. Badawy, J. R. Jensen, M. G. Christensen, and M. A. Little, “Quality control of voice recordings in remote parkinson’s disease monitoring using the infinite hidden markov model,” in Proc. Int. Conf. Acoust., Speech, Signal Process., pp. 805-809, 2019.
  8. A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes, “Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models,” Network and Distributed Systems Security Symposium, 2019.
  9. L. Song, R. Shokri, and P. Mittal, “Privacy risks of securing machine learning models against adversarial examples,” in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019, pp. 241–257.
  10. S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in IEEE Computer Security Foundations symposium (CSF), 2018, pp. 268–282.
  11. KlasLeino and M. Fredrikson, “Stolen memories: Leveraging model memorization for calibrated {{\{{White-Box}}\}} membership inference,” in USENIX Security Symposium (USENIX), 2020, pp. 1605–1622.
  12. A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, and H. Jégou, “White-box vs black-box: Bayes optimal strategies for membership inference,” in Int. Conf. Mach. Learn. (ICML), 2019, pp. 5558–5567.
  13. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2016, pp. 2818–2826.
  14. Y. Kaya and T. Dumitras, “When does data augmentation help with membership inference attacks?” in Int. Conf. Mach. Learn. (ICML), 2021, pp. 5345–5355.
  15. D. Hintersdorf, L. Struppek, and K. Kersting, “To trust or not to trust prediction scores for membership inference attacks,” in IJCAI, 2022, pp. 3043–3049.
  16. S. Rezaei and X. Liu, “On the difficulty of membership inference attacks,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2021, pp. 7892–7900.
  17. N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramèr, “Membership inference attacks from first principles,” in IEEE Symposium on Security and Privacy (S&P).   IEEE, 2022, pp. 1897–1914.
  18. J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild.” Journal of Machine Learning Research, vol. 7, no. 12, 2006.
  19. A. Lazarevic, L. Ert”oz, V. Kumar, A. Ozgur, and J. Srivastava, “A comparative study of anomaly detection schemes in network intrusion detection,” in Proceedings of the SIAM International Conference on Data Mining, 2003, pp. 25–36.
  20. V. Feldman, “Does learning require memorization? a short tale about a long tail,” in Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2020, pp. 954–959.
  21. V. Feldman and C. Zhang, “What neural networks memorize and why: Discovering the long tail via influence estimation,” Adv. Neural Inform. Process. Syst. (NeurIPS), vol. 33, pp. 2881–2891, 2020.
  22. C. Dwork, “Differential privacy,” in International Colloquium on Automata, Languages and Programming (ICALP), 2006, pp. 1–12.
  23. C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, “Calibrating noise to sensitivity in private data analysis,” J. Priv. Confidentiality, vol. 7, no. 3, pp. 17–51, 2016.
  24. H. Hu, Z. Salcic, G. Dobbie, and X. Zhang, “Membership inference attacks on machine learning: A survey,” ACM Computing Surveys (CSUR), 2021.
  25. D. Yu, H. Zhang, W. Chen, J. Yin, and T. Liu, “How does data augmentation affect privacy in machine learning?” in AAAI, vol. 35, 2021, pp. 10 746–10 753.
  26. B. Jayaraman, L. Wang, K. Knipmeyer, Q. Gu, and D. Evans, “Revisiting membership inference under realistic assumptions,” Proceedings on Privacy Enhancing Technologies, vol. 2021, no. 2, 2021.
  27. C. A. Choquette-Choo, F. Tramèr, N. Carlini, and N. Papernot, “Label-only membership inference attacks,” in Int. Conf. Mach. Learn. (ICML), 2021, pp. 1964–1974.
  28. A. Athalye, N. Carlini, and D. A. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in Int. Conf. Mach. Learn. (ICML), vol. 80, 2018, pp. 274–283.
  29. F. Tramèr, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,” in Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
  30. X. Li, Z. Wang, B. Zhang, F. Sun, and X. Hu, “Recognizing object by components with human prior knowledge enhances adversarial robustness of deep neural networks,” IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), vol. 45, no. 7, pp. 8861–8873, 2023.
  31. X. Li, W. Zhang, Y. Liu, Z. Hu, B. Zhang, and X. Hu, “Language-driven anchors for zero-shot adversarial robustness,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2024.
  32. X. Li, H. Chen, and X. Hu, “On the importance of backbone to the adversarial robustness of object detectors,” arXiv preprint arXiv:2305.17438, 2023.
  33. L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian, “Disturblabel: Regularizing cnn on the loss layer,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2016, pp. 4753–4762.
  34. J. M. Cohen, E. Rosenfeld, and J. Z. Kolter, “Certified adversarial robustness via randomized smoothing,” in Int. Conf. Mach. Learn. (ICML), 2019, pp. 1310–1320.
  35. T. Devries and G. W. TaylorW, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
  36. H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in Int. Conf. Learn. Represent. (ICLR), 2018.
  37. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Adv. Neural Inform. Process. Syst. (NeurIPS), vol. 25, 2012.
  38. G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  39. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Int. Conf. Learn. Represent. (ICLR), 2018.
  40. H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in Int. Conf. Mach. Learn. (ICML), 2019, pp. 7472–7482.
  41. D. Wu, S. Xia, and Y. Wang, “Adversarial weight perturbation helps robust generalization,” in Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
  42. Y. Long, V. Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. Gunter, and K. Chen, “Understanding membership inferences on well-generalized learning models,” arXiv preprint arXiv:1802.04889, 2018.
  43. J. Jia, A. Salem, M. Backes, Y. Zhang, and N. Z. Gong, “Memguard: Defending against black-box membership inference attacks via adversarial examples,” in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019, pp. 259–274.
  44. L. Song and P. Mittal, “Systematic evaluation of privacy risks of machine learning models,” in USENIX Security Symposium (USENIX), 2021, pp. 2615–2632.
  45. L. Watson, C. Guo, G. Cormode, and A. Sablayrolles, “On the importance of difficulty calibration in membership inference attacks,” in Int. Conf. Learn. Represent. (ICLR), 2022.
  46. K. Alex, H. Geoffrey et al., “Learning multiple layers of features from tiny images,” in Toronto, ON, Canada, 2009.
  47. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2016, pp. 770–778.
  48. D. Hintersdorf, L. Struppek, and K. Kersting, “To trust or not to trust prediction scores for membership inference attacks,” in IJCAI, L. D. Raedt, Ed., 2022, pp. 3043–3049.
  49. G. D. Grosso, H. Jalalzai, G. Pichler, C. Palamidessi, and P. Piantanida, “Leveraging adversarial examples to quantify membership information leakage,” IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2022.
  50. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A. Y. Ng et al., “Reading digits in natural images with unsupervised feature learning,” in NeurIPS workshop on deep learning and unsupervised feature learning, vol. 2011, no. 5, 2011, p. 7.
  51. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Int. Conf. Mach. Learn. (ICML), vol. 37, 2015, pp. 448–456.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiao Li (354 papers)
  2. Qiongxiu Li (26 papers)
  3. Zhanhao Hu (14 papers)
  4. Xiaolin Hu (97 papers)
Citations (11)