ML Privacy Meter: Aiding Regulatory Compliance by Quantifying the Privacy Risks of Machine Learning (2007.09339v1)

Published 18 Jul 2020 in cs.CR and cs.LG

Abstract: When building machine learning models using sensitive data, organizations should ensure that the data processed in such systems is adequately protected. For projects involving machine learning on personal data, Article 35 of the GDPR mandates it to perform a Data Protection Impact Assessment (DPIA). In addition to the threats of illegitimate access to data through security breaches, machine learning models pose an additional privacy risk to the data by indirectly revealing about it through the model predictions and parameters. Guidances released by the Information Commissioner's Office (UK) and the National Institute of Standards and Technology (US) emphasize on the threat to data from models and recommend organizations to account for and estimate these risks to comply with data protection regulations. Hence, there is an immediate need for a tool that can quantify the privacy risk to data from models. In this paper, we focus on this indirect leakage about training data from machine learning models. We present ML Privacy Meter, a tool that can quantify the privacy risk to data from models through state of the art membership inference attack techniques. We discuss how this tool can help practitioners in compliance with data protection regulations, when deploying machine learning models.

Citations (68)

View on Semantic Scholar

Summary

The paper presents ML Privacy Meter to quantify ML privacy risks using membership inference attacks and provide risk scores for DPIA.
It evaluates both black-box and white-box settings, generating ROC curves to illustrate true-positive versus false-positive trade-offs.
It offers actionable insights for risk mitigation and GDPR compliance by guiding the selection of privacy-preserving techniques.

Quantifying Privacy Risks in Machine Learning: An Analysis of the ML Privacy Meter

The paper "ML Privacy Meter: Aiding Regulatory Compliance by Quantifying the Privacy Risks of Machine Learning" by Sasi Kumar Murakonda and Reza Shokri addresses a critical aspect of ML concerning the quantification of privacy risks associated with model training on sensitive data. This work is positioned within the context of compliance with data protection regulations, particularly the General Data Protection Regulation (GDPR), which necessitates a Data Protection Impact Assessment (DPIA).

Privacy Risks in Machine Learning

Machine learning models inherently encode information about their training datasets. While the primary expectation is for models to learn general patterns, they often memorize specific data points. This memorization poses a privacy risk, especially when the training data includes sensitive personal information. Techniques known as membership inference attacks exploit this risk by determining if a specific data point was included in the training set based on the model's behavior.

The paper emphasizes two prevalent settings in which privacy risks are assessed:

Black-box Access: The attacker can only observe the model's predictions. This scenario reflects common use cases where ML models are deployed as services on platforms like Amazon Web Services, Microsoft Azure, or Google Cloud.
White-box Access: The attacker has access to both the model's predictions and internal parameters. This is relevant in situations where models are shared or outsourced, such as in federated learning environments.

ML Privacy Meter

The ML Privacy Meter is introduced as a tool designed to quantify the privacy risks associated with machine learning models. It utilizes membership inference attacks to determine these risks and provides risk scores that indicate the likelihood of data records being inferred from a model’s outputs or parameters. The tool generates detailed reports assessing both aggregate and individual privacy risks, which can guide compliance with regulations like GDPR.

A significant contribution of the ML Privacy Meter is its ability to simulate various attack scenarios, analyzing potential privacy leaks. It produces ROC curves that visualize the trade-off between true positive and false positive rates of membership inference attacks. A higher area under the ROC curve correlates with greater information leakage, highlighting the risk posed by the model.

Practical Implications

From a regulatory compliance perspective, the ML Privacy Meter equips organizations with a mechanism to evaluate and mitigate privacy risks. By providing quantitative assessments, it enables organizations to:

Perform informed DPIAs by analyzing the model’s privacy risks.
Select appropriate privacy-preserving techniques, such as differential privacy, by evaluating risk under different privacy parameters.
Implement practical risk reduction strategies such as adjusting model regularization or data resampling.

Future Directions

This paper's approach suggests several avenues for future research and development. Enhancements to the ML Privacy Meter could include support for a broader range of attack models and integration with differential privacy frameworks to optimize utility-privacy trade-offs. Furthermore, as regulatory frameworks evolve, tools like the ML Privacy Meter will be instrumental in adapting compliance measures to new standards.

In conclusion, this work provides a foundational contribution to the field of ML privacy assessment, enabling organizations to understand and mitigate risks associated with machine learning on sensitive data. The ML Privacy Meter thus emerges as a pivotal tool in aligning ML practices with privacy regulations, ensuring responsible deployment of AI technologies.

PDF Markdown

Related Papers

GitHub

GitHub - privacytrustlab/ml_privacy_meter: Privacy Meter: An open-source library to audit data privacy in statistical and machine learning algorithms. (662 stars)