A Comparative Audit of Privacy Policies from Healthcare Organizations in USA, UK and India (2306.11557v1)
Abstract: Data privacy in healthcare is of paramount importance (and thus regulated using laws like HIPAA) due to the highly sensitive nature of patient data. To that end, healthcare organizations mention how they collect/process/store/share this data (i.e., data practices) via their privacy policies. Thus there is a need to audit these policies and check compliance with respective laws. This paper addresses this need and presents a large-scale data-driven study to audit privacy policies from healthcare organizations in three countries -- USA, UK, and India. We developed a three-stage novel \textit{workflow} for our audit. First, we collected the privacy policies of thousands of healthcare organizations in these countries and cleaned this privacy policy data using a clustering-based mixed-method technique. We identified data practices regarding users' private medical data (medical history) and site privacy (cookie, logs) in these policies. Second, we adopted a summarization-based technique to uncover exact broad data practices across countries and notice important differences. Finally, we evaluated the cross-country data practices using the lens of legal compliance (with legal expert feedback) and grounded in the theory of Contextual Integrity (CI). Alarmingly, we identified six themes of non-alignment (observed in 21.8\% of data practices studied in India) pointed out by our legal experts. Furthermore, there are four \textit{potential violations} according to case verdicts from Indian Courts as pointed out by our legal experts. We conclude this paper by discussing the utility of our auditing workflow and the implication of our findings for different stakeholders.
- PolicyQA: A reading comprehension dataset for privacy policies. In EMNLP Findings, pages 743–749, 2020. https://aclanthology.org/2020.findings-emnlp.66.
- Privacy policies over time: Curation and analysis of a million-document dataset. In Proceedings of the Web Conference 2021, pages 2165–2176, 2021.
- Analyzing website privacy requirements using a privacy goal taxonomy. In IEEE RE, pages 23–31, 2002. https://ieeexplore.ieee.org/document/1048502.
- Discovering smart home internet of things privacy norms using contextual integrity. ACM IMWUT, pages 1–23, 2018. https://dl.acm.org/doi/10.1145/3214262.
- Evaluating the Contextual Integrity of Privacy Regulation: Parents’ IoT Toy Privacy Norms Versus COPPA. In Usenix Security, pages 123–140, 2019. https://www.usenix.org/conference/usenixsecurity19/presentation/apthorpe.
- Finding a choice in a haystack: automatic extraction of opt-out statements from privacy policy text. In ACM WWW’20, pages 1943–1954, 2020. https://doi.org/10.1145/3366423.3380262.
- Adrien Barbaresi. Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction. In ACL-IJCNLP, pages 122–131, 2021. https://aclanthology.org/2021.acl-demo.15.
- Content analysis of privacy policies before and after gdpr. In 2022 19th Annual International Conference on Privacy, Security & Trust (PST), pages 1–9, 2022.
- A theory of vagueness and privacy risk perception. In IEEE (RE) Conference, pages 26–35, 2016. https://ieeexplore.ieee.org/document/7765508.
- Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. http://dx.doi.org/10.1088/1742-5468/2008/10/P10008.
- Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.
- The use of MMR, diversity-based reranking for reordering documents and producing summaries. In ACM SIGIR, pages 335–336, 1998. https://doi.org/10.1145/290941.291025.
- Center for Disease Control and Prevention. Health Insurance Portability and Accountability Act of 1996 (HIPAA). https://www.cdc.gov/phlp/publications/topic/hipaa.html, 1996.
- Competition Commission of India. Updated Terms Of Service And … vs Whatsapp Llc. https://indiankanoon.org/doc/99533020/, 2022.
- SUPERT: Towards new frontiers in unsupervised evaluation metrics for multi-document summarization. In ACL, pages 1347–1354, 2020. https://aclanthology.org/2020.acl-main.124.
- Data mining and electronic devices applied to quality of life related to health data. In CISTI, pages 1–4, 2015. https://ieeexplore.ieee.org/document/7170627.
- Automatic Section Title Generation to Improve the Readability of Privacy Policies. In USENIX SOUPS, 2020. https://www.usenix.org/conference/soups2020/presentation/gopinath.
- Supervised and unsupervised methods for robust separation of section titles and prose text in web documents. In EMNLP, pages 850–855, 2018. https://aclanthology.org/D18-1099/.
- A comprehensive keyword analysis of online privacy policies. Information Security Journal: A Global Perspective, 27(5-6):260–275, 2018.
- The menlo report: Ethical principles guiding information and communication technology research. Available at SSRN 2445102, 2012. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2445102.
- Kerala High Court. Kerela High Court: Suo Motu vs Travancore Devaswom Board - TBD. https://indiankanoon.org/doc/142754318/, 2022.
- Toward Domain-Guided Controllable Summarization of Privacy Policies. In NLLP, pages 18–24, 2020. http://ceur-ws.org/Vol-2645/#paper3.
- From word embeddings to document distances. In ICML, pages 957–966, 2015. https://proceedings.mlr.press/v37/kusnerb15.html.
- UK Legislation. Data Protection Act of 2018 (DPA18). https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted, 2018.
- Empirical comparison of algorithms for network community detection. In WWW, pages 631–640, 2010. https://dl.acm.org/doi/10.1145/1772690.1772755.
- Measuring the effectiveness of privacy policies for voice assistant applications. In ACM ACSAC, page 856–869, 2020. https://doi.org/10.1145/3427228.3427250.
- Timothy Libert. An automated approach to auditing disclosure of third-party data collection in website privacy policies. In WWW, pages 207–216, 2018. https://doi.org/10.1145/3178876.3186087.
- A step towards usable privacy policy: Automatic alignment of privacy statements. In ACL-COLING, pages 884–894, 2014. https://aclanthology.org/C14-1084/.
- Automatically assessing machine summary content without a gold standard. Computational Linguistics, 39(2):267–300, 2013. https://aclanthology.org/J13-2002/.
- An ontology driven knowledge block summarization approach for chinese judgment document classification. IEEE Access, 6:71327–71338, 2018.
- Automated text mining for requirements analysis of policy documents. In IEEE RE 2013, pages 4–13, 2013. https://ieeexplore.ieee.org/document/6636700.
- Medindia. Directory for Indian Hospitals. https://www.medindia.net/patients/hospital_search/hospital_list.asp, 2001.
- Gabriele Meiselwitz. Readability assessment of policies and procedures of social networking sites. In OCSC, pages 67–75, 2013. https://link.springer.com/chapter/10.1007/978-3-642-39371-6_8.
- "My Friend Wanted to Talk About It and I Didn’t": Understanding Perceptions of Deletion Privacy in Social Platforms. CoRR, 2020. https://arxiv.org/abs/2008.11317.
- Ministry of communications & Information technology. Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011, 2011. https://www.meity.gov.in/writereaddata/files/GSR313E_10511(1)_0.pdf.
- Ministry of Health and Family Welfare. Digital lnformation Security in Healthcare, act (DISHA), 2018. https://www.nhp.gov.in/NHPfiles/R_4179_1521627488625_0.pdf.
- Helen Nissenbaum. Privacy as contextual integrity. Wash. L. Rev., 79:119, 2004. https://digitalcommons.law.uw.edu/wlr/vol79/iss1/10/.
- Nv7-Github. GoogleSearch. https://github.com/Nv7-GitHub/googlesearch, 2020.
- PrivOnto: A semantic framework for the analysis of privacy policies. Semantic Web, 9(2), 2018. https://dl.acm.org/doi/abs/10.3233/SW-170283.
- The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999. http://ilpubs.stanford.edu:8090/422/.
- European Parliament and Council of the European Union. GDPR. https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2016.
- Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy? In ACL-IJCNLP, pages 4125–4140, 2021. https://aclanthology.org/2021.acl-long.319.
- Question answering for privacy policies: Combining computational and legal perspectives. In EMNLP-IJCNLP, pages 4947–4958, 2019. https://aclanthology.org/D19-1500.
- Helping users understand privacy notices with automated query answering functionality: An exploratory study. Technical report, Technical report, Carnegie Mellon University, 2017. https://usableprivacy.org/static/files/CMU-ISR-17-114R.pdf.
- Identifying the Provision of Choices in Privacy Policy Text. In ACL-EMNLP, pages 2774–2779, 2017. https://aclanthology.org/D17-1294/.
- Going against the (appropriate) flow: a contextual integrity approach to privacy policy analysis. In HCOMP, pages 162–170, 2019. https://ojs.aaai.org/index.php/HCOMP/article/view/5266.
- Privacy at scale: Introducing the PrivaSeer corpus of web privacy policies. In ACL-ICNLP, pages 6829–6839, 2021. https://aclanthology.org/2021.acl-long.532.
- Supreme Court of India. Justice K.S.Puttaswamy(retd) vs Union of India. https://indiankanoon.org/doc/127517806/, 2022.
- Importance of Data Mining in Healthcare: A Survey. In ASONAM, pages 1057–1062, 2015. https://ieeexplore.ieee.org/document/7403678.
- The Centers for Medicare & Medicaid Services. Hospital General Information. https://data.cms.gov/provider-data/dataset/xubh-q36u, 2022.
- Automatic Summarization of Privacy Policies using Ensemble Learning. In CODASPY, pages 133–135, 2016. https://dl.acm.org/doi/10.1145/2857705.2857741.
- NHS UK. Authorities and Trusts. https://www.nhs.uk/servicedirectories/pages/nhstrustlisting.aspx, 2018.
- An Introduction to Grounded Theory with a Special Focus on Axial Coding and the Coding Paradigm, pages 81–100. Springer International Publishing, 2019. https://doi.org/10.1007/978-3-030-15636-7_4.
- The creation and analysis of a website privacy policy corpus. In ACL 2016, pages 1330–1340, 2016. https://aclanthology.org/P16-1126/.
- Physical Health Data Mining of College Students Based on DRF Algorithm. Wireless Personal Communications, 102(4), 2018. https://link.springer.com/article/10.1007/s11277-018-5410-5.
- Privacycheck: Automatic summarization of privacy policies using data mining. ACM TOIT, 18(4):1–18, 2018.
- Summpip: Unsupervised multi-document summarization with sentence graph compression. In Proceedings of the 43rd international acm sigir conference on research and development in information retrieval, pages 1949–1952, 2020.
- MAPS: Scaling privacy compliance analysis to a million apps. PETS, 2019(3):66–86, 2019. https://usableprivacy.org/static/files/popets-2019-maps.pdf.
- Automated analysis of privacy requirements for mobile apps. In 2016 AAAI Fall Symposium Series, 2016. https://www.aaai.org/ocs/index.php/FSS/FSS16/paper/viewPaper/14113.