Requirements Satisfiability with In-Context Learning (2404.12576v1)
Abstract: LLMs that can learn a task at inference time, called in-context learning (ICL), show increasing promise in natural language inference tasks. In ICL, a model user constructs a prompt to describe a task with a natural language instruction and zero or more examples, called demonstrations. The prompt is then input to the LLM to generate a completion. In this paper, we apply ICL to the design and evaluation of satisfaction arguments, which describe how a requirement is satisfied by a system specification and associated domain knowledge. The approach builds on three prompt design patterns, including augmented generation, prompt tuning, and chain-of-thought prompting, and is evaluated on a privacy problem to check whether a mobile app scenario and associated design description satisfies eight consent requirements from the EU General Data Protection Regulation (GDPR). The overall results show that GPT-4 can be used to verify requirements satisfaction with 96.7% accuracy and dissatisfaction with 93.2% accuracy. Inverting the requirement improves verification of dissatisfaction to 97.2%. Chain-of-thought prompting improves overall GPT-3.5 performance by 9.0% accuracy. We discuss the trade-offs among templates, models and prompt strategies and provide a detailed analysis of the generated specifications to inform how the approach can be applied in practice.
- K. Attwood, T. Kelly, J. McDermid, “The Use of Satisfaction Arguments for Traceability in Requirements Reuse for System Families: Position Paper,” International Workshop on Requirements Reuse in System Family Engineering, 2004.
- R. A. Bauer, “Consumer behavior as risk taking,” Risk Taking and Information Handling in Consumer Behavior 1960, pp. 389-398.
- S.R. Bowman, G. Angeli, C. Potts, C.D. Manning. “A large annotated corpus for learning natural language inference.” Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
- O-M. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. “e-SNLI: Natural Language Inference with Natural Language Explanations,” Advances in Neural Information Processing Systems (NeurIPS) v. 31, 2018.
- P.F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, D. Amodei. “Deep reinforcement learning from human preferences.” Advances in Neural Information Processing Systems (NeurIPS), 2017.
- J. Cohen. “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, 20: 37-46, 1960
- Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, L. Li and Z. Sui, “Survey on In-context Learning,” arXiv:2301.00234
- D. Dua, S. Gupta, S. Singh, M. Gardner. “Successive Prompting for Decomposing Complex Questions.” Empirical Methods in Natural Language Processing (EMNLP), pp. 1251–1265, 2022.
- European Data Protection Board, “Guidelines 05/2020 on consent under Regulation 2016/679,” Version 1.1, adopted 4 May 2020.
- S. Gehman, S. Gururangan, M. Sap, Y. Choi, N.A. Smith. “RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models.” Findings of the Association for Computational Linguistics, pp. 3356–3369, 2020.
- C.B. Haley, R. Laney, J.D. Moffett, B. Nuseibeh, “Arguing satisfaction of security requirements.” In: H. Mouratidis, P. Giorgini, eds. Integrating security and software engineering: advances and future vision. IG Press, 2006.
- C. Haley, R. Laney, J. Moffett and B. Nuseibeh, “Security Requirements Engineering: A Framework for Representation and Analysis,” IEEE Transactions on Software Engineering, 34(1):133-153, 2008
- M. Jackson, “The World and the Machine,” International Conference on Software Engineering (ICSE), pp. 283-283, 1995.
- T. Khot, H. Trivedi, M. Finlayson, Y. Fu, K. Richardson, P. Clark, A. Sabharwal, “Decomposed Prompting: A Modular Approach for Solving Complex Tasks” International Conference on Learning Representations, 2023.
- T. Kojima, S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa. “Large Language Models are Zero-shot Reasoners.” Advances in Neural Information Processing Systems (NeurIPS) 35, pp. 22199-22213, 2022.
- J.R. Landis, G.G. Koch. “The measurement of observer agreement for categorical data.” Biometrics 1977, 33: 159-74.
- B. Lester, R. Al-Rfou, N. Constant. “The Power of Scale for Parameter-Efficient Prompt Tuning,” Empirical Methods in Natural Language Processing (EMNLP), pp. 3045–3059, 2021.
- P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, D. Kiela. “Retrieval-augmented generation for knowledge-intensive NLP tasks.” Advances in Neural Information Processing Systems (NeurIPS), 2020.
- S. Lin, J. Hilton, O. Evans. “TruthfulQA: Measuring How Models Mimic Human Falsehoods.” Association for Computational Linguistics (ACL), pp. 3214–3252, 2022
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv:1907.11692 2019.
- P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubug. “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing,” ACM Computing Surveys, 55(9): Article 195, 2023.
- J. Lockerbie N. Maiden, J. Engmann, D. Randall, S. Jones, D. Bush, “Exploring the impact of software requirements on system-wide goals: a method using satisfaction arguments and i* goal modelling,” Requirements Engineering Journal, 7:227–254, 2012.
- Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp, “Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity,” 60thsuperscript60𝑡ℎ60^{th}60 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Annual Meeting of the Association for Computational Linguistics, pp. 8086–8098, 2022.
- N. Maiden, J. Lockerbie, D. Randall , S. Jones, D. Bush, “Using Satisfaction Arguments to Enhance i* Modelling of an Air Traffic Management System,” IEEE International Requirements Engineering Conference, pp. 49-52, 2007.
- B. MacCartney, C.D. Manning. “Modeling Semantic Containment and Exclusion in Natural Language Inference,” 22ndsuperscript22𝑛𝑑22^{nd}22 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPT International Conference on Computational Linguistics (Coling), pp. 521–528, 2008.
- A. Murugesan, M.W. Whalen, E. Ghassabani, M.P.E. Heimdah, “Complete Traceability for Requirements in Satisfaction Arguments,” IEEE International Requirements Engineering Conference, RE@Next!,, pp. 359-364, 2016.
- N. Nangia, C. Vania, R. Bhalerao, S.R. Bowman “CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models” Empirical Methods in Natural Language Processing (EMNLP), pp. 1953–1967, 2020.
- K. Ryan, “The Role of Natural Language in Requirements Engineering,” First IEEE International Symposium on Requirements Engineering, pp. 240-242, 1993.
- T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, R. Hambro, H. Zettlemoyer, N. Cancedda, T. Scialom. “Toolformer: Language Models Can Teach Themselves to Use Tools,” Advances in Neural Information Processing Systems (NeurIPS), 2023.
- T. Sorensen, J. Robinson, C. Rytting, A. Shaw, K. Rogers, A. Delorey, M. Khalil, N. Fulda, D. Wingate. “An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels.” Association for Computational Linguistics (ACL), pp. 819–862, 2022.
- H. Trivedi, N. Balasubramanian, T. Khot, A. Sabharwal. “Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions,” Association for Computational Linguistics (ACL), pp. 10014–10037, 2023.
- A. van Lamsweerde. “Requirements engineering: from craft to discipline.” 16thsuperscript16𝑡ℎ16^{th}16 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT ACM SIGSOFT International Symposium on Foundations of software engineering (FSE), pp. 238–249, 2008.
- S. Wang, H. Fang, M. Khabsa, H. Mao, H. Ma. “Entailment as Few-Shot Learner,” Association for Computational Linguistics (ACL), pp. 13803–13817, 2023.
- X. Wang, J. Wei, D. Schuurmans, Q. V Le, E. H. Chi, S. Narang, A. Chowdhery, D. Zhou. “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” International Conference on Learning Representations (ICLR), 2023.
- L. Wang, W. Xu, Y. Lan, Z. Hu, Y. Lan, R. Ka-Wei Lee, E-P. Lim. “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models,” Association for Computational Linguistics (ACL), 2023.
- J. Wei, M. Bosma, V.Y. Zhao, K. Guu, A.W. Yu, B. Lester, N. Du, A.M. Dai, Q.V. Le, “Finetuned Language Models Are Zero-Shot Learners,” International Conference on Learning Representations, 2022
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Advances in Neural Information Processing Systems 35, pp. 24824-24837, 2022.
- J. Wei, D. Huang, Y. Lu, D. Zhou, Q.V. Le “Simple Synthetic Data Reduces Sycophancy in Large Language Models,” arXiv: 2308.03958 , 2023
- A. Williams, N. Nangia, S. Bowman. “A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,” North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
- S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, K.R. Narasimhan, “Tree of Thoughts: Deliberate Problem Solving with Large Language Models,” Advances in Neural Information Processing Systems (NeurIPS), 2023.
- P. Zave, M. Jackson. “Four Dark Corners of Requirements Engineering,” ACM Transactions on Software Engineering and Methodology, 6(1): 1-30, 1997.
- T.Z. Zhao, E. Wallace, S. Feng, D. Klein, S. Singh, “Calibrate Before Use: Improving Few-Shot Performance of Language Models,” 38thsuperscript38𝑡ℎ38^{t}h38 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_h International Conference on Machine Learning (PMLR) 139, 2021.
- D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. Le, E. Chi, “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” International Conference on Learning Representations, 2023