Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based Ranking, Crowd and Expert-driven Methods (2401.02986v1)

Published 2 Jan 2024 in cs.CL and cs.AI

Abstract: Organizations face the challenge of ensuring compliance with an increasing amount of requirements from various regulatory documents. Which requirements are relevant depends on aspects such as the geographic location of the organization, its domain, size, and business processes. Considering these contextual factors, as a first step, relevant documents (e.g., laws, regulations, directives, policies) are identified, followed by a more detailed analysis of which parts of the identified documents are relevant for which step of a given business process. Nowadays the identification of regulatory requirements relevant to business processes is mostly done manually by domain and legal experts, posing a tremendous effort on them, especially for a large number of regulatory documents which might frequently change. Hence, this work examines how legal and domain experts can be assisted in the assessment of relevant requirements. For this, we compare an embedding-based NLP ranking method, a generative AI method using GPT-4, and a crowdsourced method with the purely manual method of creating relevancy labels by experts. The proposed methods are evaluated based on two case studies: an Australian insurance case created with domain experts and a global banking use case, adapted from SAP Signavio's workflow example of an international guideline. A gold standard is created for both BPMN2.0 processes and matched to real-world textual requirements from multiple regulatory documents. The evaluation and discussion provide insights into strengths and weaknesses of each method regarding applicability, automation, transparency, and reproducibility and provide guidelines on which method combinations will maximize benefits for given characteristics such as process usage, impact, and dynamics of an application scenario.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Dossier@coliee 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval, 2021.
  2. Injecting the bm25 score as text improves bert-based re-rankers, 2023.
  3. Choosing the Right Crowd: Expert Finding in Social Networks. In Extending Database Technology, pages 637–648. ACM, 2013.
  4. Just tell me: Prompt engineering in business process management. In Enterprise, Business-Process and Information Systems Modeling, pages 3–11, 2023.
  5. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In World Wide Web, pages 469–478. ACM, 2012.
  6. An introduction to hybrid human-machine information systems. Foundations and Trends® in Web Science, 7(1):1–87, 2017.
  7. Pick-a-Crowd: Tell Me What You like, and I’ll Tell You What to Do. In World Wide Web, pages 367–374. ACM, 2013.
  8. Automatic document classification via transformers for regulations compliance management in large utility companies. Neural Comput. Appl., 35(23):17167–17185, 2023.
  9. Combining natural language processing approaches for rule extraction from legal documents. In AI Approaches to the Complexity of Legal Systems, pages 287–300, 2017.
  10. Towards an Integrated Crowdsourcing Definition. Journal of Information Science, 38(2):189–200, 2012.
  11. Summary of the competition on legal information, extraction/entailment (COLIEE) 2023. In Artificial Intelligence and Law, pages 472–480. ACM, 2023.
  12. The role of legal expertise in interpretation of legal requirements and definitions. In Requirements Engineering, pages 273–282. IEEE Computer Society, 2014.
  13. Detecting regulatory compliance for business process models through semantic annotations. In Business Process Management Workshops, pages 5–17. Springer, 2008.
  14. The journey to business process compliance. In Jorge Cardoso and Wil M. P. van der Aalst, editors, Handbook of Research on Business Process Modeling, pages 426–454. IGI Global, 2009.
  15. Normative requirements for regulatory compliance: An abstract formal framework. Inf. Syst. Frontiers, 18(3):429–455, 2016.
  16. Information retrieval by semantic similarity. Int. J. Semantic Web Inf. Syst., 2(3):55–73, 2006.
  17. Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. In ACM Web Conference, pages 294–297. ACM, 2023.
  18. Computing Crowd Consensus with Partial Agreement. IEEE Transactions on Knowledge and Data Engineering, 30(1):1–14, 2018.
  19. Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks. The Journal of Machine Learning Research, 18(1):3233–3299, 2017.
  20. How can we know what language models know. Trans. Assoc. Comput. Linguistics, 8:423–438, 2020.
  21. Crowdsourcing for Book Search Evaluation: Impact of HIT Design on Comparative System Ranking. In Research and Development in Information Retrieval, pages 205–214. ACM, 2011.
  22. COLIEE 2022 summary: Methods for legal document retrieval and entailment. In New Frontiers in Artificial Intelligence, pages 51–67. Springer, 2022.
  23. Legal information retrieval and entailment based on bm25, transformer and semantic thesaurus methods. Rev. Socionetwork Strateg., 16(1):157–174, 2022.
  24. The Future of Crowd Work. In Computer Supported Cooperative Work, pages 1301–1318. ACM, 2013.
  25. Iraklis A. Klampanos. Manning christopher, prabhakar raghavan, hinrich schütze: Introduction to information retrieval. Inf. Retr., 12(5):609–612, 2009.
  26. Large language models are zero-shot reasoners. In NeurIPS, 2022.
  27. Leveraging semantic and lexical matching to improve the recall of document retrieval systems: A hybrid approach. CoRR, abs/2010.01195, 2020.
  28. Searching textual and model-based process descriptions based on a unified data format. Softw. Syst. Model., 18(2):1179–1194, 2019.
  29. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. CoRR, abs/2304.01852, 2023.
  30. Choosing the right business process maturity model. Inf. Manag., 50(7):466–488, 2013.
  31. Compliance monitoring in business processes: Functionalities, application, and tool-support. Inf. Syst., 54:209–234, 2015.
  32. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing, pages 139–148, 2016.
  33. nigam@coliee-22: Legal case retrieval and entailment using cascading of lexical and semantic-based models. In New Frontiers in Artificial Intelligence, pages 96–108. Springer, 2022.
  34. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. In Workshops at the 25th AAAI Conference on Artificial Intelligence, pages 43–48, 2011.
  35. OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023.
  36. Human Computation: A Survey and Taxonomy of a Growing Field. In SIGCHI Conference on Human Factors in Computing Systems, pages 1403–1412. ACM, 2011.
  37. COLIEE 2020: Methods for legal document retrieval and entailment. In New Frontiers in Artificial Intelligence, pages 196–210. Springer, 2020.
  38. Sentence-bert: Sentence embeddings using siamese bert-networks. In Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2019.
  39. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, 2009.
  40. Modeling control objectives for business process compliance. In Business Process Management, pages 149–164, 2007.
  41. Detecting deviations between external and internal regulatory requirements for improved process compliance assessment. In Int’l Conference on Advanced Information Systems Engineering, page 1–16, 2023.
  42. Extracting meaningful entities from regulatory text: Towards automating regulatory compliance. In Workshop on Requirements Engineering and Law, pages 29–32, 2012.
  43. Query-based retrieval of german regulatory documents for internal auditing purposes. In Data Science and Information Technology, pages 1–10. IEEE, 2022.
  44. Legal prompt engineering for multilingual legal judgement prediction. CoRR, abs/2212.02199, 2022.
  45. Checking process compliance against natural language specifications using behavioral spaces. Inf. Syst., 78:83–95, 2018.
  46. Community-Based Bayesian Aggregation Models for Crowdsourcing. In World Wide Web, pages 155–164. ACM, 2014.
  47. Business process and rule integration approaches—an empirical analysis of model understanding. Information Systems, 104:101901, 2022.
  48. Analyzing privacy policies at scale: From crowdsourcing to automated annotations. ACM Trans. Web, 13(1):1:1–1:29, 2019.
  49. Detecting constraints and their relations from regulatory documents using NLP techniques. In On the Move to Meaningful Internet Systems, pages 261–278. Springer, 2018.
  50. Deriving and combining mixed graphs from regulatory documents based on constraint relations. In Advanced Information Systems Engineering, pages 430–445, 2019.
  51. Assessing the compliance of business process models with regulatory documents. In Conceptual Modeling, pages 189–203. Springer, 2020.
  52. Natural language processing for requirements engineering: A systematic mapping study. ACM Computing Surv., 54(3), 2021.
  53. Crowdsourcing Interactions: Using Crowdsourcing for Evaluating Interactive Information Retrieval Systems. Information Retrieval, 16(2):267–305, 2013.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com