Leveraging Transformer-based Language Models to Automate Requirements Satisfaction Assessment (2312.04463v1)
Abstract: Requirements Satisfaction Assessment (RSA) evaluates whether the set of design elements linked to a single requirement provide sufficient coverage of that requirement -- typically meaning that all concepts in the requirement are addressed by at least one of the design elements. RSA is an important software engineering activity for systems with any form of hierarchical decomposition -- especially safety or mission critical ones. In previous studies, researchers used basic Information Retrieval (IR) models to decompose requirements and design elements into chunks, and then evaluated the extent to which chunks of design elements covered all chunks in the requirement. However, results had low accuracy because many critical concepts that extend across the entirety of the sentence were not well represented when the sentence was parsed into independent chunks. In this paper we leverage recent advances in natural language processing to deliver significantly more accurate results. We propose two major architectures: Satisfaction BERT (Sat-BERT), and Dual-Satisfaction BERT (DSat-BERT), along with their multitask learning variants to improve satisfaction assessments. We perform RSA on five different datasets and compare results from our variants against the chunk-based legacy approach. All BERT-based models significantly outperformed the legacy baseline, and Sat-BERT delivered the best results returning an average improvement of 124.75% in Mean Average Precision.
- M. W. Whalen, A. Gacek, D. D. Cofer, A. Murugesan, M. P. E. Heimdahl, and S. Rayadurgam, “Your ”what” is my ”how”: Iteration and hierarchy in system design,” IEEE Softw., vol. 30, no. 2, pp. 54–60, 2013. [Online]. Available: https://doi.org/10.1109/MS.2012.173
- M. W. Whalen, A. Murugesan, and M. P. E. Heimdahl, “Your what is my how: Why requirements and architectural design should be iterative,” in First IEEEInternational Workshop on the Twin Peaks of Requirements and Architecture, TwinPeaks@RE 2012, Chicago, IL, USA, September 25, 2012. IEEE Computer Society, 2012, pp. 36–40. [Online]. Available: https://doi.org/10.1109/TwinPeaks.2012.6344559
- J. Cleland-Huang, O. Gotel, J. H. Hayes, P. Mäder, and A. Zisman, “Software traceability: trends and future directions,” in Proceedings of the on Future of Software Engineering, FOSE 2014, Hyderabad, India, May 31 - June 7, 2014, J. D. Herbsleb and M. B. Dwyer, Eds. ACM, 2014, pp. 55–69. [Online]. Available: https://doi.org/10.1145/2593882.2593891
- O. Gotel, J. Cleland-Huang, J. H. Hayes, A. Zisman, A. Egyed, P. Grünbacher, A. Dekhtyar, G. Antoniol, J. I. Maletic, and P. Mäder, “Traceability fundamentals,” in Software and Systems Traceability, J. Cleland-Huang, O. Gotel, and A. Zisman, Eds. Springer, 2012, pp. 3–22. [Online]. Available: https://doi.org/10.1007/978-1-4471-2239-5_1
- R. Oliveto, M. Gethers, D. Poshyvanyk, and A. D. Lucia, “On the equivalence of information retrieval methods for automated traceability link recovery: A ten-year retrospective,” in ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 2020, p. 1. [Online]. Available: https://doi.org/10.1145/3387904.3394491
- J. H. Hayes, A. Dekhtyar, and S. K. Sundaram, “Advancing candidate link generation for requirements tracing: The study of methods,” IEEE Trans. Software Eng., vol. 32, no. 1, pp. 4–19, 2006. [Online]. Available: https://doi.org/10.1109/TSE.2006.3
- J. Guo, J. Cheng, and J. Cleland-Huang, “Semantically enhanced software traceability using deep learning techniques,” in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 2017, pp. 3–14.
- J. Lin, Y. Liu, Q. Zeng, M. Jiang, and J. Cleland-Huang, “Traceability transformed: Generating more accurate links with pre-trained bert models,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 324–335.
- J. Lin, A. Poudel, W. Yu, Q. Zeng, M. Jiang, and J. Cleland-Huang, “Enhancing automated software traceability by transfer learning from open-world data,” arXiv preprint arXiv:2207.01084, 2022.
- E. A. Holbrook, “Assessing satisfaction of requirements by design elements,” in International Conference on Requirements Engineering Doctoral Symposium, 2006.
- E. A. Holbrook, J. H. Hayes, and A. Dekhtyar, “Toward automating requirements satisfaction assessment,” in 2009 17th IEEE International Requirements Engineering Conference. IEEE, 2009, pp. 149–158.
- J. Cleland-Huang, O. C. Gotel, J. Huffman Hayes, P. Mäder, and A. Zisman, “Software traceability: trends and future directions,” in Future of software engineering proceedings, 2014, pp. 55–69.
- O. Gotel and A. Finkelstein, “Extended requirements traceability a framework for changing requirements.” CAISE, 1996.
- E. A. Holbrook, J. H. Hayes, A. Dekhtyar, and W. Li, “A study of methods for textual satisfaction assessment,” Empirical Software Engineering, vol. 18, no. 1, pp. 139–176, 2013.
- Y. Elazar, N. Kassner, S. Ravfogel, A. Ravichander, E. Hovy, H. Schütze, and Y. Goldberg, “Measuring and improving consistency in pretrained language models,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 1012–1031, 2021.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- J. Liu, Y. Lin, Z. Liu, and M. Sun, “XQA: A cross-lingual open-domain question answering dataset,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 2358–2368. [Online]. Available: https://aclanthology.org/P19-1227
- US Department of Transportation, “Requirements Engineering Management Handbook, Appendix A - Isolette Thermostat Example,” DOD/FAA/AR-08/32, 2009.
- R. Caruana, “Multitask learning,” Machine learning, vol. 28, no. 1, pp. 41–75, 1997.
- D. Jufrasky and J. H. Martin, “N-gram language models,” in Speech and Language Processing, 2022. [Online]. Available: https://web.stanford.edu/~jurafsky/slp3/3.pdf
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- I. Beltagy, K. Lo, and A. Cohan, “Scibert: A pretrained language model for scientific text,” arXiv preprint arXiv:1903.10676, 2019.
- Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler, “Aligning books and movies: Towards story-like visual explanations by watching movies and reading books,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 19–27.
- S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
- J. Baxter, “A bayesian/information theoretic model of learning to learn via multiple task sampling,” Machine learning, vol. 28, no. 1, pp. 7–39, 1997.
- J. Sayyad Shirabad and T. Menzies, “The PROMISE Repository of Software Engineering Databases.” School of Information Technology and Engineering, University of Ottawa, Canada, 2005. [Online]. Available: http://promise.site.uottawa.ca/SERepository
- J. Cleland-Huang, M. Vierhauser, and S. Bayley, “Dronology: An incubator for cyber-physical system research,” arXiv preprint arXiv:1804.02423, 2018.
- M. Rahimi and J. Cleland-Huang, “Evolving software trace links between requirements and source code,” Empirical Software Engineering, vol. 23, no. 4, pp. 2198–2231, 2018.
- T. Krismayer, R. Rabiser, and P. Grünbacher, “A constraint mining approach to support monitoring cyber-physical systems,” in International Conference on Advanced Information Systems Engineering. Springer, 2019, pp. 659–674.
- T. Krismayer, P. Kronberger, R. Rabiser, and P. Grünbacher, “Supporting the selection of constraints for requirements monitoring from automatically mined constraint candidates,” in International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, 2019, pp. 193–208.
- S. Bird and E. Loper, “Nltk: the natural language toolkit.” Association for Computational Linguistics, 2004.
- Wikipedia, “Shapley value — Wikipedia, the free encyclopedia,” http://en.wikipedia.org/w/index.php?title=Shapley%20value&oldid=1102122782, 2022, [Online; accessed 12-August-2022].
- Y. Shin, J. H. Hayes, and J. Cleland-Huang, “Guidelines for benchmarking automated software traceability techniques,” in 8th IEEE/ACM International Symposium on Software and Systems Traceability, SST 2015, Florence, Italy, May 17, 2015, P. Mäder and R. Oliveto, Eds. IEEE Computer Society, 2015, pp. 61–67. [Online]. Available: https://doi.org/10.1109/SST.2015.13
- Wikipedia, “Negative transfer (memory) — Wikipedia, the free encyclopedia,” http://en.wikipedia.org/w/index.php?title=Negative%20transfer%20(memory)&oldid=1078828873, 2022, [Online; accessed 28-August-2022].
- W. Lu, J. Jiao, and R. Zhang, “Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2645–2652.
- R. Caruana, “Multitask learning: A knowledge-based source of inductive bias1,” in Proceedings of the Tenth International Conference on Machine Learning. Citeseer, 1993, pp. 41–48.
- K. Hacioglu, S. Pradhan, W. Ward, J. H. Martin, and D. Jurafsky, “Semantic role labeling by tagging syntactic chunks,” in Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, 2004, pp. 110–113.