Improving Requirements Completeness: Automated Assistance through Large Language Models (2308.03784v2)
Abstract: Natural language (NL) is arguably the most prevalent medium for expressing systems and software requirements. Detecting incompleteness in NL requirements is a major challenge. One approach to identify incompleteness is to compare requirements with external sources. Given the rise of LLMs, an interesting question arises: Are LLMs useful external sources of knowledge for detecting potential incompleteness in NL requirements? This article explores this question by utilizing BERT. Specifically, we employ BERT's masked LLM (MLM) to generate contextualized predictions for filling masked slots in requirements. To simulate incompleteness, we withhold content from the requirements and assess BERT's ability to predict terminology that is present in the withheld content but absent in the disclosed content. BERT can produce multiple predictions per mask. Our first contribution is determining the optimal number of predictions per mask, striking a balance between effectively identifying omissions in requirements and mitigating noise present in the predictions. Our second contribution involves designing a machine learning-based filter to post-process BERT's predictions and further reduce noise. We conduct an empirical evaluation using 40 requirements specifications from the PURE dataset. Our findings indicate that: (1) BERT's predictions effectively highlight terminology that is missing from requirements, (2) BERT outperforms simpler baselines in identifying relevant yet missing terminology, and (3) our filter significantly reduces noise in the predictions, enhancing BERT's effectiveness as a tool for completeness checking of requirements.
- Is requirements similarity a good proxy for software similarity? an empirical investigation in industry. In 27th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’21), 2021.
- Generating obstacle conditions for requirements completeness. In 34th International Conference on Software Engineering (ICSE’12), 2012.
- AI-enabled automation for completeness checking of privacy policies. IEEE Transactions on Software Engineering, 48(11), 2022.
- An empirical study on the potential usefulness of domain models for completeness checking of requirements. Empirical Software Engineering, 24(4), 2019.
- Automated checking of conformance to requirements templates using natural language processing. IEEE Transactions on Software Engineering, 41(10), 2015.
- Automated extraction and clustering of requirements glossary terms. IEEE Transactions on Software Engineering, 43(10), 2017.
- An active learning approach for improving the accuracy of automated domain model extraction. ACM Transactions on Software Engineering and Methodology, 28, 2019.
- Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2), 2012.
- Daniel Berry. Empirical evaluation of tools for hairy requirements engineering tasks. Empirical Software Engineering, 26, 11 2021.
- From contract drafting to software specification: Linguistic sources of ambiguity, a handbook, 2003. https://cs.uwaterloo.ca/~dberry/handbook/ambiguityHandbook.pdf.
- Semantic incompleteness in privacy policy goals. In 26th IEEE International Requirements Engineering Conference (RE’18), 2018.
- Feature selection in machine learning: A new perspective. Neurocomputing, 300, 2018.
- J. Anthony Capon. Elementary Statistics for the Social Sciences: Study Guide. Wadsworth, 1991.
- Corpus exploitation from Wikipedia for ontology construction. In 6th International Conference on Language Resources and Evaluation (LREC’08), 2008.
- Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In 24th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’18), 2018.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), 2019.
- Challenging incompleteness of performance requirements by sentence patterns. In 24th IEEE International Requirements Engineering Conference (RE’16), 2016.
- Evaluating the completeness and granularity of functional requirements specifications: A controlled experiment. In 17th IEEE International Requirements Engineering Conference (RE’09), 2009.
- Automated handling of anaphoric ambiguity in requirements: A multi-solution study. In 44th International Conference on Software Engineering (ICSE’22), 2022.
- Using domain-specific corpora for improved handling of ambiguity in requirements. In 43rd International Conference on Software Engineering (ICSE’21), 2021.
- WikiDoMiner: Wikipedia domain-specific miner. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’22), 2022.
- Christiane Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.
- Cost-sensitive learning. In Learning from Imbalanced Data Sets. Springer, 2018.
- Measuring and improving the completeness of natural language requirements. In 20th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’14), 2014.
- Detecting domain-specific ambiguities: an NLP approach based on Wikipedia crawling and word embeddings. In 25th IEEE International Requirements Engineering Conference Workshops (REW’17), 2017.
- PURE: A dataset of public requirements documents. In 25th IEEE International Requirements Engineering Conference (RE’17), 2017.
- A semantic driven approach for requirements verification. In David Camacho, Lars Braubach, Salvatore Venticinque, and Costin Badica, editors, Intelligent Distributed Computing VIII. Springer, Cham, 2015.
- Abbreviation-expansion pair detection for glossary term extraction. In 28th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’22), 2022.
- M. Hess and J. Kromrey. Robust confidence intervals for effect sizes: A comparative study of cohen’s d and cliff’s delta under non-normality and heterogeneous variances. Annual Meeting of the American Educational Research Association, 2004.
- NoRBERT: Transfer learning for requirements classification. In 28th IEEE International Requirements Engineering Conference (RE’20), 2020.
- Advances in natural language processing. Science, 349(6245), 2015.
- D. Jurafsky and J.H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson, 2 edition, 2019.
- Tomasz P. Krzeszowski. Contrasting languages: The scope of contrastive linguistics, volume 51. Walter de Gruyter, 2011.
- Feature selection for knowledge discovery and data mining, volume 454. Springer Science & Business Media, 2012.
- Improving agile requirements: the quality user story framework and tool. Requirements Engineering, 21, 2016.
- Replication package, 2023. https://bit.ly/REJ-BERT-2023.
- Using language models for enhancing the completeness of natural-language requirements. In 29th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’23), 2023.
- Introduction to Information Retrieval. Syngress, 2008.
- Linguistic regularities in continuous space word representations. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’13), 2013.
- Open AI. ChatGPT. https://openai.com/blog/chatgpt [Last accessed: June 2023].
- OpenAI. GPT-4 technical report. arXiv.2303.08774, 2023.
- GloVe: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14), 2014.
- Extracting and classifying requirements from software engineering contracts. In 28th IEEE International Requirements Engineering Conference (RE’20), 2020.
- Claude Sammut and Geoffrey I. Webb, editors. TF–IDF. Springer, 2010.
- Domain model extraction from user-authored scenarios and word embeddings. In 30th IEEE International Requirements Engineering Conference Workshops (REW’22), 2022.
- Automated extraction of semantic legal metadata using natural language processing. In 26th IEEE International Requirements Engineering Conference (RE’18), 2018.
- A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 2000.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 4 edition, 2017.
- The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann Publishers Inc., 4th edition, 2016.
- Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13, 08 2018.
- Natural language processing for requirements engineering: A systematic mapping study. ACM Comput. Surv., 54(3), 2021.
- On the interplay between consistency, completeness, and correctness in requirements evolution. Information and Software Technology, 45(14), 2003.
- The three Cs of requirements: Consistency, completeness, and correctness. In 8th International Workshop on Requirements Engineering: Foundation for Software Quality (REFSQ’03), 2003.