Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs (2311.13881v1)

Published 23 Nov 2023 in cs.SE and cs.AI

Abstract: Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, LLMing, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa LLMs. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. In: 2022 IEEE 30th International Requirements Engineering Conference (RE), pp. 39–50. IEEE (2022)
  2. Software & Systems Modeling 21(4), 1613–1641 (2022)
  3. arXiv preprint arXiv:2302.13149 (2023)
  4. Information and Software Technology 159, 107202 (2023)
  5. In: 2021 IEEE 29th International Requirements Engineering Conference Workshops. IEEE (2021)
  6. IEEE Transactions on Software Engineering (2021)
  7. arXiv preprint arXiv:2209.09722 (2022)
  8. Empirical Software Engineering 24, 2509–2539 (2019)
  9. In: Requirements Engineering: Foundation for Software Quality: 29th International Working Conference, REFSQ 2023, Barcelona, Spain, April 17–20, 2023, Proceedings, pp. 105–121. Springer (2023)
  10. Requirements Engineering 24(3) (2019)
  11. Bird, S.: Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006)
  12. ACM Computing Surveys (CSUR) 49(2), 1–50 (2016)
  13. In: 30th IEEE International Requirements Engineering Conference (2022)
  14. Breitbarth, P.: The impact of gdpr one year on. Network Security 2019(7), 11–13 (2019)
  15. ArXiv abs/2010.02559 (2020)
  16. Cohen, J.: A coefficient of agreement for nominal scales. Educational and psychological measurement 20(1), 37–46 (1960)
  17. arXiv preprint arXiv:1810.04805 (2018)
  18. European Union: The GDPR: New opportunities, new obligations. Justice and Consumers (2018)
  19. European Union: General data protection regulation. Official Journal of the European Union (2018)
  20. Data Protection and Privacy, Volume 13: Data Protection and Artificial Intelligence 13, 1 (2021)
  21. The MIT Press (1998)
  22. In: Requirements Engineering: Foundation for Software Quality: 20th International Working Conference, REFSQ 2014, Essen, Germany, April 7-10, 2014. Proceedings 20, pp. 23–38. Springer (2014)
  23. arXiv preprint arXiv:2305.15006 (2023)
  24. In: Proceedings of 22nd IEEE International Conference on Requirements Engineering (2014)
  25. URL http://Skylion007. github. io/OpenWebTextCorpus (2019)
  26. Scotts Valley: CreateSpace (2009)
  27. Halterman, A.: Synthetically generated text for supervised text analysis. arXiv preprint arXiv:2303.16028 (2023)
  28. arXiv preprint arXiv:2304.01331 (2023)
  29. In: Proceedings of 22nd IEEE International Requirements Engineering Conference (2014)
  30. Islam, Q.N.: Mastering PyCharm. Packt Publishing Ltd (2015)
  31. ACRN Journal of Finance and Risk Perspectives, Special Issue Digital Accounting 8, 71–85 (2019)
  32. arXiv preprint arXiv:2305.12641 (2023)
  33. arXiv preprint arXiv:1909.11942 (2019)
  34. AI Open 3, 71–90 (2022)
  35. ArXiv abs/1907.11692 (2019)
  36. In: Requirements Engineering: Foundation for Software Quality: 29th International Working Conference, REFSQ 2023, Barcelona, Spain, April 17–20, 2023, Proceedings, pp. 87–104. Springer (2023)
  37. Requirements Engineering 17, 99–115 (2012)
  38. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)
  39. Requirements Engineering 27(2), 183–209 (2022)
  40. Nagel, S.: Cc-news. URL: http://web. archive. org/save/http://commoncrawl. org/2016/10/newsdatasetavailable (2016)
  41. In: 15th IEEE International Requirements Engineering Conference (RE 2007), pp. 5–14 (2007)
  42. Computer law & Security review 34(4), 881–885 (2018)
  43. Advances in neural information processing systems 32 (2019)
  44. the Journal of machine Learning research 12, 2825–2830 (2011)
  45. Software & Systems Modeling 18(6) (2019)
  46. arXiv preprint arXiv:1712.05972 (2017)
  47. In: Requirements Engineering: Foundation for Software Quality, pp. 35–51. Springer International Publishing (2022)
  48. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019)
  49. Transactions of the Association for Computational Linguistics 10, 716–731 (2022)
  50. In: 2019 IEEE 27th international requirements engineering conference (RE), pp. 319–329. IEEE (2019)
  51. In: International Conference on Model Driven Engineering Languages and Systems, pp. 450–466. Springer (2014)
  52. Software & Systems Modeling 17(3) (2018)
  53. In: 28th IEEE International Requirements Engineering Conference (2020)
  54. Software & Systems Modeling 20(6), 2071–2087 (2021)
  55. In: 22nd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, (2019)
  56. In: Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) (2022)
  57. Fam med 37(5), 360–363 (2005)
  58. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics (2020)
  59. Requirements Engineering 20(1) (2015)
  60. In: Proceedings of the IEEE international conference on computer vision, pp. 19–27 (2015)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Muhammad Ilyas Azeem (2 papers)
  2. Sallam Abualhaija (13 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.