ChatGPT and Software Testing Education: Promises & Perils (2302.03287v3)

Published 7 Feb 2023 in cs.SE and cs.HC

Abstract: Over the past decade, predictive LLMing for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "LLMs", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a LLM created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with answering common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct responses. Based on these findings, we discuss the potential promises and perils related to the use of ChatGPT by students and instructors.

PDF Abstract

Overview of "ChatGPT and Software Testing Education: Promises and Perils"

The paper "ChatGPT and Software Testing Education: Promises and Perils" explores the potential benefits and challenges of integrating ChatGPT, a conversational agent developed by OpenAI, into software testing education. As sophisticated LLMs increasingly infiltrate educational contexts, their impact on student learning and instructor practices becomes imperative to understand. This paper presents a meticulous investigation into ChatGPT's performance within a software testing curriculum, specifically examining its efficacy in addressing common educational tasks.

Key Findings and Contributions

The paper centers on assessing ChatGPT’s ability to answer questions extracted from a widely-used software testing textbook by Ammann and Offutt. In total, the authors curated a dataset comprising 31 exercise questions from five chapters of the textbook. Through empirical evaluation, they measured ChatGPT’s performance in issuing correct answers and explanations, with particular emphasis on shared versus separate prompting contexts—a key consideration in LLM interactions.

The paper demonstrates that ChatGPT can correctly respond to 55.6% of the questions, indicating moderate utility in educational settings. Notably, the results manifest improvements when questions are posed within a shared context as opposed to separate contexts, with ChatGPT achieving correct or partially correct answers in 55.6% of instances and satisfactory explanations in 53.0% of cases. This discrepancy underscores the importance of context in prompting LLMs effectively.

Moreover, the authors report the inconsistencies in ChatGPT's answers due to the non-deterministic nature of LLM outputs. Approximately 9.7% of questions resulted in inconsistent answer correctness across multiple prompts, and 6.5% of explanations varied similarly. These findings inform about the challenges inherent to deploying AI-based systems for educational purposes where reliability and consistency are crucial.

Implications and Future Directions

The implications of this research extend to both practical and theoretical domains in AI and education. Practically, the integration of models like ChatGPT poses educational paradigms with unprecedented challenges and opportunities. On one side, instructors need to mitigate the risk of students using these models to sidestep genuine learning while leveraging AI’s capabilities to enhance educational experiences. On the other side, promising avenues include deploying AI assistance in guiding students through complex tasks or using models for prompt feedback during learning activities.

Theoretically, the findings affirm the nuanced understanding of LLMs' operational limits and pave the way for refining model prompting and interaction strategies. Specifically, employing context-rich multi-turn interactions could potentially enhance the reliability of AI responses, a hypothesis warranting further exploration.

Conclusion

In conclusion, while ChatGPT exhibits a reasonable level of proficiency in addressing software testing questions, its use in educational settings necessitates careful considerations regarding question design, context provisioning, and system reliability. As AI continues to evolve and permeate educational practices, ongoing research must delve into optimizing the interplay between human learners and AI systems to foster productive and secure learning environments. This paper stands as a fundamental contribution to this dialogue, highlighting ChatGPT’s capabilities and limitations in the context of software testing education. The broader discourse around AI's role in learning environments remains rich with potential but must balanced with scrutiny to ensure positive educational outcomes.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Sajed Jalil (2 papers)
Suzzana Rafi (2 papers)
Thomas D. LaToza (17 papers)
Kevin Moran (66 papers)
Wing Lam (4 papers)

Citations (146)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - sajedjalil/ChatGPT-Software-Testing-Study: A study on the capabilities and implications of ChatGPT on classroom exercises for software testing. Accepted in ICSTW 2023. (15 stars)