A Study on the Vulnerability of Test Questions against ChatGPT-based Cheating (2402.14881v1)
Abstract: ChatGPT is a chatbot that can answer text prompts fairly accurately, even performing very well on postgraduate-level questions. Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating because students may directly use answers provided by tools like ChatGPT. In this paper, we try to provide an answer to an important question: how well ChatGPT can answer test questions and how we can detect whether the questions of a test can be answered correctly by ChatGPT. We generated ChatGPT's responses to the MedMCQA dataset, which contains over 10,000 medical school entrance exam questions. We analyzed the responses and uncovered certain types of questions ChatGPT answers more inaccurately than others. In addition, we have created a basic natural language processing model to single out the most vulnerable questions to ChatGPT in a collection of questions or a sample exam. Our tool can be used by test-makers to avoid ChatGPT-vulnerable test questions.
- “Cheating by students using ChatGPT is already on the rise, surveys suggest,” Yahoo Finance, Feb. 09, 2023. https://finance.yahoo.com/news/cheating-students-using-chatgpt-already-104950384.html?guccounter=1 (accessed Aug. 12, 2023).
- M. T. Nietzel, “More Than Half Of College Students Believe Using ChatGPT To Complete Assignments Is Cheating,” Forbes. https://www.forbes.com/sites/michaeltnietzel/2023/03/20/more-than-half-of-college-students-believe-using-chatgpt-to-complete-assignments-is-cheating/?sh=4f3cb89218f9 (accessed Aug. 12, 2023).
- A. Pal, L. Umapathi, and M. Sankarasubbu, “MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering,” Proceedings of Machine Learning Research, Mar. 2022.
- M. Reiss, “Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark,” arXiv (Cornell University), Apr. 2023, doi: https://doi.org/10.48550/arxiv.2304.11085.
- V. Gurusamy and S. Kannan, “Preprocessing Techniques for Text Mining,” Oct. 2014.
- M. I. Solnyshkina, R. R. Zamaletdinov, L. Gorodetskaya, and A. I. Gabitov, “Evaluating Text Complexity and Flesch-Kincaid Grade Level,” Journal of Social Studies Education Research, vol. 8, no. 3, pp. 238–248, Nov. 2017, doi: https://doi.org/10.17499/jsser.79630.
- M. Sv, “Cosine Similarity Explained using Python — Machine Learning — PyShark,” Medium, Dec. 31, 2021. https://towardsdatascience.com/cosine-similarity-explained-using-python-machine-learning-pyshark-5c5d6b9c18fa
- G. Everding, “For better multiple-choice tests, avoid tricky questions, study finds - The Source - Washington University in St. Louis,” The Source, Sep. 28, 2018. https://source.wustl.edu/2018/09/for-better-multiple-choice-tests-avoid-tricky-questions-study-finds/
- “Managing Free Response Questions — SFUSD,” www.sfusd.edu. https://www.sfusd.edu/learning/resources-learning/google-resources-sfusd/google-instruction/managing-free-response-questions (accessed Aug. 12, 2023).
- L. Turner, “What Are the Most Common Premed/Pre-Health Majors? - SDN,” Student Doctor Network, Oct. 11, 2022. https://www.studentdoctor.net/2022/10/11/what-are-the-most-common-premed-pre-health-majors/
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv (Cornell University), Oct. 2018.
- D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” Dec. 2014.
- [1]L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001.
- Shanker Ram (2 papers)
- Chen Qian (226 papers)