ATLAS: Automatically Detecting Discrepancies Between Privacy Policies and Privacy Labels
Abstract: Privacy policies are long, complex documents that end-users seldom read. Privacy labels aim to ameliorate these issues by providing succinct summaries of salient data practices. In December 2020, Apple began requiring that app developers submit privacy labels describing their apps' data practices. Yet, research suggests that app developers often struggle to do so. In this paper, we automatically identify possible discrepancies between mobile app privacy policies and their privacy labels. Such discrepancies could be indicators of potential privacy compliance issues. We introduce the Automated Privacy Label Analysis System (ATLAS). ATLAS includes three components: a pipeline to systematically retrieve iOS App Store listings and privacy policies; an ensemble-based classifier capable of predicting privacy labels from the text of privacy policies with 91.3% accuracy using state-of-the-art NLP techniques; and a discrepancy analysis mechanism that enables a large-scale privacy analysis of the iOS App Store. Our system has enabled us to analyze 354,725 iOS apps. We find several interesting trends. For example, only 40.3% of apps in the App Store provide easily accessible privacy policies, and only 29.6% of apps provide both accessible privacy policies and privacy labels. Among apps that provide both, 88.0% have at least one possible discrepancy between the text of their privacy policy and their privacy label, which could be indicative of a potential compliance issue. We find that, on average, apps have 5.32 such potential compliance issues. We hope that ATLAS will help app developers, researchers, regulators, and mobile app stores alike. For example, app developers could use our classifier to check for discrepancies between their privacy policies and privacy labels, and regulators could use our system to help review apps at scale for potential compliance issues.
- Rethinking complex neural network architectures for document classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4046–4051, 2019.
- Longitudinal analysis of privacy labels in the apple app store. arXiv preprint arXiv:2206.02658, 2022.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- Don Blaheta. Handling noisy training and testing data. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 111–116, 2002.
- Federal Trade Commission. Gramm-leach-bliley act. https://www.ftc.gov/business-guidance/privacy-security/gramm-leach-bliley-act, 2023. Accessed: 2023-03-13.
- BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Susan T Dumais et al. Latent semantic analysis. Annu. Rev. Inf. Sci. Technol., 38(1):188–230, 2004.
- TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226–231, 1996.
- Helping mobile application developers create accurate privacy labels. In 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 212–230. IEEE, 2022.
- Apple Inc. App privacy details - app store.
- Apple Inc. iTunes preview. https://apps.apple.com/us/genre/ios-books/id6018, 2023. Accessed: 2023-03-13.
- Privacy as part of the app decision-making process. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 3393–3402, 2013.
- Yoon Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar, October 2014. Association for Computational Linguistics.
- Keeping privacy labels honest. Proceedings on Privacy Enhancing Technologies, 4:486–506, 2022.
- Goodbye tracking? impact of iOS app tracking transparency and privacy labels. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 508–520, New York, NY, USA, 2022. Association for Computing Machinery.
- Understanding challenges for developers to create accurate privacy nutrition labels. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–24, 2022.
- Deep learning for extreme multi-label text classification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pages 115–124, 2017.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- The cost of reading privacy policies. Isjlp, 4:543, 2008.
- Automatic differentiation in PyTorch. 2017.
- 50 ways to leak your data: An exploration of apps’ circumvention of the android permissions system. In 28th USENIX security symposium (USENIX security 19), pages 603–620, 2019.
- Disagreeable privacy policies: Mismatches between meaning and users’ understanding. Berkeley Tech. LJ, 30:39, 2015.
- The usable privacy policy project. In Technical report, Technical Report, CMU-ISR-13-119. Carnegie Mellon University, 2013.
- Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Natural language processing for mobile app privacy compliance. In AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies, 2019.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Lalaine: Measuring and characterizing non-compliance of apple privacy labels at scale. arXiv preprint arXiv:2206.06274, 2022.
- Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 1480–1489, 2016.
- Do privacy labels answer users’ privacy questions?
- Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
- Evaluating multi-label classifiers with noisy labels. arXiv preprint arXiv:2102.08427, 2021.
- PrivacyFlash Pro: Automating privacy policy generation for mobile apps. In NDSS, 2021.
- Compliance traceability: Privacy policies as software development artifacts. Open Day for Privacy, Usability, and Transparency (PUT), Stockholm, Sweden, 2019.
- MAPS: Scaling privacy compliance analysis to a million apps. Proceedings on Privacy Enhancing Technologies, 2019(3):66–86, 2019.
- Automated analysis of privacy requirements for mobile apps. In NDSS, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.