Automated Detection and Analysis of Data Practices Using A Real-World Corpus (2402.11006v1)
Abstract: Privacy policies are crucial for informing users about data practices, yet their length and complexity often deter users from reading them. In this paper, we propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Leveraging crowd-sourced annotations from the ToS;DR platform, we experiment with various methods to match policy excerpts with predefined data practice descriptions. We further conduct a case study to evaluate our approach on a real-world policy, demonstrating its effectiveness in simplifying complex policies. Experiments show that our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.
- Policyqa: A reading comprehension dataset for privacy policies. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 743–749.
- Susan B Barnes. 2006. A privacy paradox: Social networking in the united states. First Monday, 11(9).
- Privacy personas: Clustering users via attitudes and behaviors toward security practices. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 5228–5239.
- Readability of privacy policies of healthcare websites.
- Standardizing privacy notices: an online study of the nutrition label approach. In Proceedings of the SIGCHI Conference on Human factors in Computing Systems, pages 1573–1582. ACM.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Ponnurangam Kumaraguru and Lorrie Faith Cranor. 2005. Privacy indexes: a survey of westin’s studies.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Aleecia M McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies. Isjlp, 4:543.
- The shorter the better? effects of privacy policy length on online privacy decision-making. Media and Communication, 8(2):291–301.
- Gabriele Meiselwitz. 2013. Readability assessment of policies and procedures of social networking sites. In International Conference on Online Communities and Social Computing, pages 67–75. Springer.
- Researchers’ experiences in analyzing privacy policies: Challenges and opportunities. Proceedings on Privacy Enhancing Technologies.
- Privacycheck v3: Empowering users with higher-level understanding of privacy policies. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 1593–1596.
- Privacycheck v2: A tool that recaps privacy policies for you. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 3441–3444.
- Jonathan A Obar and Anne Oeldorf-Hirsch. 2018. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. Information, Communication & Society, pages 1–20.
- Question answering for privacy policies: Combining computational and legal perspectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4949–4959.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
- Terms of service; didn’t read. Web Page, June. URL https://tosdr. org.
- A design space for effective privacy notices. In Eleventh Symposium On Usable Privacy and Security ({normal-{\{{SOUPS}normal-}\}} 2015), pages 1–17.
- E-privacy in 2nd generation e-commerce: privacy preferences versus actual behavior. In Proceedings of the 3rd ACM conference on Electronic Commerce, pages 38–47.
- Privacy now or never: Large-scale extraction and analysis of dates in privacy policy text. In Proceedings of the ACM Symposium on Document Engineering 2023, DocEng ’23, New York, NY, USA. Association for Computing Machinery.
- Privacy lost and found: An investigation at scale of web privacy policy availability. In Proceedings of the ACM Symposium on Document Engineering 2023, DocEng ’23, New York, NY, USA. Association for Computing Machinery.
- Privaseer: A privacy policy search engine. In International Conference on Web Engineering, pages 286–301.
- Privacy at scale: Introducing the privaseer corpus of web privacy policies. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6829–6839.
- A large-scale exploration of terms of service documents on the web. In Proceedings of the 21st ACM Symposium on Document Engineering, pages 1–4.
- Privacy not found: a study of the availability of privacy policies on the web.
- Privacyguide: towards an implementation of the eu gdpr on internet privacy policy evaluation. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, pages 15–21.
- The creation and analysis of a website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1330–1340.
- Privacycheck: Automatic summarization of privacy policies using data mining. ACM Transactions on Internet Technology (TOIT), 18(4):1–18.
- Shikun Zhang and Norman Sadeh. 2023. Do privacy labels answer users’ privacy questions. In Network and Distributed System Security Symposium.
- Sebastian Zimmeck and Steven M Bellovin. 2014. Privee: An architecture for automatically analyzing web privacy policies. In 23rd {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 14), pages 1–16.