Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Requirements Satisfiability with In-Context Learning (2404.12576v1)

Published 19 Apr 2024 in cs.SE

Abstract: LLMs that can learn a task at inference time, called in-context learning (ICL), show increasing promise in natural language inference tasks. In ICL, a model user constructs a prompt to describe a task with a natural language instruction and zero or more examples, called demonstrations. The prompt is then input to the LLM to generate a completion. In this paper, we apply ICL to the design and evaluation of satisfaction arguments, which describe how a requirement is satisfied by a system specification and associated domain knowledge. The approach builds on three prompt design patterns, including augmented generation, prompt tuning, and chain-of-thought prompting, and is evaluated on a privacy problem to check whether a mobile app scenario and associated design description satisfies eight consent requirements from the EU General Data Protection Regulation (GDPR). The overall results show that GPT-4 can be used to verify requirements satisfaction with 96.7% accuracy and dissatisfaction with 93.2% accuracy. Inverting the requirement improves verification of dissatisfaction to 97.2%. Chain-of-thought prompting improves overall GPT-3.5 performance by 9.0% accuracy. We discuss the trade-offs among templates, models and prompt strategies and provide a detailed analysis of the generated specifications to inform how the approach can be applied in practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. K. Attwood, T. Kelly, J. McDermid, “The Use of Satisfaction Arguments for Traceability in Requirements Reuse for System Families: Position Paper,” International Workshop on Requirements Reuse in System Family Engineering, 2004.
  2. R. A. Bauer, “Consumer behavior as risk taking,” Risk Taking and Information Handling in Consumer Behavior 1960, pp. 389-398.
  3. S.R. Bowman, G. Angeli, C. Potts, C.D. Manning. “A large annotated corpus for learning natural language inference.” Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
  4. O-M. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. “e-SNLI: Natural Language Inference with Natural Language Explanations,” Advances in Neural Information Processing Systems (NeurIPS) v. 31, 2018.
  5. P.F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, D. Amodei. “Deep reinforcement learning from human preferences.” Advances in Neural Information Processing Systems (NeurIPS), 2017.
  6. J. Cohen. “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, 20: 37-46, 1960
  7. Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, L. Li and Z. Sui, “Survey on In-context Learning,” arXiv:2301.00234
  8. D. Dua, S. Gupta, S. Singh, M. Gardner. “Successive Prompting for Decomposing Complex Questions.” Empirical Methods in Natural Language Processing (EMNLP), pp. 1251–1265, 2022.
  9. European Data Protection Board, “Guidelines 05/2020 on consent under Regulation 2016/679,” Version 1.1, adopted 4 May 2020.
  10. S. Gehman, S. Gururangan, M. Sap, Y. Choi, N.A. Smith. “RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models.” Findings of the Association for Computational Linguistics, pp. 3356–3369, 2020.
  11. C.B. Haley, R. Laney, J.D. Moffett, B. Nuseibeh, “Arguing satisfaction of security requirements.” In: H. Mouratidis, P. Giorgini, eds. Integrating security and software engineering: advances and future vision. IG Press, 2006.
  12. C. Haley, R. Laney, J. Moffett and B. Nuseibeh, “Security Requirements Engineering: A Framework for Representation and Analysis,” IEEE Transactions on Software Engineering, 34(1):133-153, 2008
  13. M. Jackson, “The World and the Machine,” International Conference on Software Engineering (ICSE), pp. 283-283, 1995.
  14. T. Khot, H. Trivedi, M. Finlayson, Y. Fu, K. Richardson, P. Clark, A. Sabharwal, “Decomposed Prompting: A Modular Approach for Solving Complex Tasks” International Conference on Learning Representations, 2023.
  15. T. Kojima, S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa. “Large Language Models are Zero-shot Reasoners.” Advances in Neural Information Processing Systems (NeurIPS) 35, pp. 22199-22213, 2022.
  16. J.R. Landis, G.G. Koch. “The measurement of observer agreement for categorical data.” Biometrics 1977, 33: 159-74.
  17. B. Lester, R. Al-Rfou, N. Constant. “The Power of Scale for Parameter-Efficient Prompt Tuning,” Empirical Methods in Natural Language Processing (EMNLP), pp. 3045–3059, 2021.
  18. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, D. Kiela. “Retrieval-augmented generation for knowledge-intensive NLP tasks.” Advances in Neural Information Processing Systems (NeurIPS), 2020.
  19. S. Lin, J. Hilton, O. Evans. “TruthfulQA: Measuring How Models Mimic Human Falsehoods.” Association for Computational Linguistics (ACL), pp. 3214–3252, 2022
  20. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv:1907.11692 2019.
  21. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubug. “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing,” ACM Computing Surveys, 55(9): Article 195, 2023.
  22. J. Lockerbie N. Maiden, J. Engmann, D. Randall, S. Jones, D. Bush, “Exploring the impact of software requirements on system-wide goals: a method using satisfaction arguments and i* goal modelling,” Requirements Engineering Journal, 7:227–254, 2012.
  23. Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp, “Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity,” 60t⁢hsuperscript60𝑡ℎ60^{th}60 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Annual Meeting of the Association for Computational Linguistics, pp. 8086–8098, 2022.
  24. N. Maiden, J. Lockerbie, D. Randall , S. Jones, D. Bush, “Using Satisfaction Arguments to Enhance i* Modelling of an Air Traffic Management System,” IEEE International Requirements Engineering Conference, pp. 49-52, 2007.
  25. B. MacCartney, C.D. Manning. “Modeling Semantic Containment and Exclusion in Natural Language Inference,” 22n⁢dsuperscript22𝑛𝑑22^{nd}22 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPT International Conference on Computational Linguistics (Coling), pp. 521–528, 2008.
  26. A. Murugesan, M.W. Whalen, E. Ghassabani, M.P.E. Heimdah, “Complete Traceability for Requirements in Satisfaction Arguments,” IEEE International Requirements Engineering Conference, RE@Next!,, pp. 359-364, 2016.
  27. N. Nangia, C. Vania, R. Bhalerao, S.R. Bowman “CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models” Empirical Methods in Natural Language Processing (EMNLP), pp. 1953–1967, 2020.
  28. K. Ryan, “The Role of Natural Language in Requirements Engineering,” First IEEE International Symposium on Requirements Engineering, pp. 240-242, 1993.
  29. T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, R. Hambro, H. Zettlemoyer, N. Cancedda, T. Scialom. “Toolformer: Language Models Can Teach Themselves to Use Tools,” Advances in Neural Information Processing Systems (NeurIPS), 2023.
  30. T. Sorensen, J. Robinson, C. Rytting, A. Shaw, K. Rogers, A. Delorey, M. Khalil, N. Fulda, D. Wingate. “An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels.” Association for Computational Linguistics (ACL), pp. 819–862, 2022.
  31. H. Trivedi, N. Balasubramanian, T. Khot, A. Sabharwal. “Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions,” Association for Computational Linguistics (ACL), pp. 10014–10037, 2023.
  32. A. van Lamsweerde. “Requirements engineering: from craft to discipline.” 16t⁢hsuperscript16𝑡ℎ16^{th}16 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT ACM SIGSOFT International Symposium on Foundations of software engineering (FSE), pp. 238–249, 2008.
  33. S. Wang, H. Fang, M. Khabsa, H. Mao, H. Ma. “Entailment as Few-Shot Learner,” Association for Computational Linguistics (ACL), pp. 13803–13817, 2023.
  34. X. Wang, J. Wei, D. Schuurmans, Q. V Le, E. H. Chi, S. Narang, A. Chowdhery, D. Zhou. “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” International Conference on Learning Representations (ICLR), 2023.
  35. L. Wang, W. Xu, Y. Lan, Z. Hu, Y. Lan, R. Ka-Wei Lee, E-P. Lim. “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models,” Association for Computational Linguistics (ACL), 2023.
  36. J. Wei, M. Bosma, V.Y. Zhao, K. Guu, A.W. Yu, B. Lester, N. Du, A.M. Dai, Q.V. Le, “Finetuned Language Models Are Zero-Shot Learners,” International Conference on Learning Representations, 2022
  37. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Advances in Neural Information Processing Systems 35, pp. 24824-24837, 2022.
  38. J. Wei, D. Huang, Y. Lu, D. Zhou, Q.V. Le “Simple Synthetic Data Reduces Sycophancy in Large Language Models,” arXiv: 2308.03958 , 2023
  39. A. Williams, N. Nangia, S. Bowman. “A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,” North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
  40. S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, K.R. Narasimhan, “Tree of Thoughts: Deliberate Problem Solving with Large Language Models,” Advances in Neural Information Processing Systems (NeurIPS), 2023.
  41. P. Zave, M. Jackson. “Four Dark Corners of Requirements Engineering,” ACM Transactions on Software Engineering and Methodology, 6(1): 1-30, 1997.
  42. T.Z. Zhao, E. Wallace, S. Feng, D. Klein, S. Singh, “Calibrate Before Use: Improving Few-Shot Performance of Language Models,” 38t⁢hsuperscript38𝑡ℎ38^{t}h38 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_h International Conference on Machine Learning (PMLR) 139, 2021.
  43. D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. Le, E. Chi, “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” International Conference on Learning Representations, 2023
Citations (2)

Summary

We haven't generated a summary for this paper yet.