MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective (2405.14981v2)
Abstract: The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people's private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been proposed towards data privacy protection. However, they bear various limitations such as leading to degraded data availability and utility, or relying on heuristics and lacking solid theoretical bases. To overcome these limitations, we propose a formal information-theoretic definition for this utility-preserving privacy protection problem, and design a data-driven learnable data transformation framework that is capable of selectively suppressing sensitive attributes from target datasets while preserving the other useful attributes, regardless of whether or not they are known in advance or explicitly annotated for preservation. We provide rigorous theoretical analyses on the operational bounds for our framework, and carry out comprehensive experimental evaluations using datasets of a variety of modalities, including facial images, voice audio clips, and human activity motion sensor signals. Results demonstrate the effectiveness and generalizability of our method under various configurations on a multitude of tasks. Our code is available at https://github.com/jpmorganchase/MaSS.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318, 2016.
- A public domain dataset for human activity recognition using smartphones. In European Symposium on Artificial Neural Networks, Computational Intelligence And Machine Learning, volume 3, pp. 3, 2013.
- Estimation efficiency under privacy constraints. IEEE Transactions on Information Theory, 65(3):1512–1534, 2018.
- Interpreting and explaining deep neural networks for classification of audio signals. arXiv preprint arXiv:1807.03418, 2018.
- Mutual information neural estimation. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pp. 531–540, 10–15 Jul 2018.
- Adversarially learned representations for information obfuscation and inference. In International Conference on Machine Learning, pp. 614–623. PMLR, 2019.
- CCPA. https://oag.ca.gov/privacy/ccpa. Accessed: 2023-09-27.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Common Crawl. Common crawl - open repository of web crawl data. https://commoncrawl.org/, 2023.
- Spact: Self-supervised privacy preservation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20164–20173, 2022.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265–284. Springer, 2006.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Censoring representations with an adversary. In International Conference on Learning Representations, 2016.
- Age and gender estimation of unfiltered faces. IEEE Transactions on information forensics and security, 9(12):2170–2179, 2014.
- GDPR. https://gdpr-info.eu/. Accessed: 2023-09-27.
- Obfuscation via information density estimation. In International Conference on Artificial Intelligence and Statistics, pp. 906–917. PMLR, 2020.
- A survey on statistical, information, and estimation—theoretic views on privacy. IEEE BITS the Information Theory Magazine, 1(1):45–56, 2021a.
- Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460, 2021b.
- Generative adversarial privacy. arXiv preprint arXiv:1807.05306, 2018.
- Deepprivacy: A generative adversarial network for face anonymization. In International symposium on visual computing, pp. 565–578. Springer, 2019.
- Pixel-in-pixel net: Towards efficient facial landmark detection in the wild. International Journal of Computer Vision, 129:3174–3194, 2021.
- Privacy-preserving action recognition via motion difference quantization. In European Conference on Computer Vision, pp. 518–534. Springer, 2022.
- Tunable measures for information leakage and applications to privacy-utility tradeoffs. IEEE Transactions on Information Theory, 65(12):8043–8066, 2019.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pp. 3384–3393. PMLR, 2018.
- Mobile sensor data anonymization. In Proceedings of the international conference on internet of things design and implementation, pp. 49–58, 2019.
- Marketing Campaign. https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis, 2023. Accessed: 2023-12-20.
- Ciagan: Conditional identity anonymization generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5447–5456, 2020.
- Mironov, I. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pp. 263–275. IEEE, 2017.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015.
- Human action image generation with differential privacy. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2020.
- Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2129–2138, 2018.
- Privacy-preserving deep action recognition: An adversarial learning framework and a new dataset. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):2126–2139, 2020.
- Differentially private releasing via deep generative model (technical report). arXiv preprint arXiv:1801.01594, 2018.