Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications (2405.11579v1)
Abstract: In the era of generative AI, the fusion of LLMs offers unprecedented opportunities for innovation in the field of modern education. We embark on an exploration of prompted LLMs within the context of educational and assessment applications to uncover their potential. Through a series of carefully crafted research questions, we investigate the effectiveness of prompt-based techniques in generating open-ended questions from school-level textbooks, assess their efficiency in generating open-ended questions from undergraduate-level technical textbooks, and explore the feasibility of employing a chain-of-thought inspired multi-stage prompting approach for language-agnostic multiple-choice question (MCQ) generation. Additionally, we evaluate the ability of prompted LLMs for language learning, exemplified through a case study in the low-resource Indian language Bengali, to explain Bengali grammatical errors. We also evaluate the potential of prompted LLMs to assess human resource (HR) spoken interview transcripts. By juxtaposing the capabilities of LLMs with those of human experts across various educational tasks and domains, our aim is to shed light on the potential and limitations of LLMs in reshaping educational practices.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Review on neural question generation for education purposes. International Journal of Artificial Intelligence in Education, pages 1–38, 2023.
- Bangla spelling error detection and correction using n-gram model. In International Conference on Machine Intelligence and Emerging Technologies, pages 468–482. Springer, 2022.
- Transforming education: A comprehensive review of generative artificial intelligence in educational settings through bibliometric and content analysis. Sustainability, 15(17):12983, 2023.
- Dialect identification of the bengali language. In Data Science and Data Analytics, pages 357–373. Chapman and Hall/CRC, 2021.
- Prompted opinion summarization with GPT-3.5. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 9282–9300, Toronto, Canada, July 2023. Association for Computational Linguistics.
- B. S. Bloom. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman, 2010.
- A. Boyd. Using Wikipedia edits in low resource grammatical error correction. In W. Xu, A. Ritter, T. Baldwin, and A. Rahimi, editors, Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 79–84, Brussels, Belgium, Nov. 2018. Association for Computational Linguistics.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- The BEA-2019 shared task on grammatical error correction. In H. Yannakoudakis, E. Kochmar, C. Leacock, N. Madnani, I. Pilán, and T. Zesch, editors, Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–75, Florence, Italy, Aug. 2019. Association for Computational Linguistics.
- S. Cao and L. Wang. Controllable open-ended question generation with a new question type ontology. In C. Zong, F. Xia, W. Li, and R. Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6424–6439, Online, Aug. 2021. Association for Computational Linguistics.
- QuAC: Question answering in context. In E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics.
- Scaling instruction-finetuned language models, 2022.
- Building a large annotated corpus of learner English: The NUS corpus of learner English. In J. Tetreault, J. Burstein, and C. Leacock, editors, Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pages 22–31, Atlanta, Georgia, June 2013. Association for Computational Linguistics.
- Developing NLP tools with a new corpus of learner Spanish. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7238–7243, Marseille, France, May 2020. European Language Resources Association.
- Unified language model pre-training for natural language understanding and generation. Advances in neural information processing systems, 32, 2019.
- BanglaRQA: A benchmark dataset for under-resourced Bangla language reading comprehension-based question answering with diverse question-answer types. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2518–2532, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics.
- KHANQ: A dataset for generating deep questions in education. In N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, and S.-H. Na, editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 5925–5938, Gyeongju, Republic of Korea, Oct. 2022. International Committee on Computational Linguistics.
- Panini: a transformer-based grammatical error correction method for bangla. Neural Computing and Applications, pages 1–15, 2023.
- Development of bangla spell and grammar checkers: Resource creation and evaluation. IEEE Access, 9:141079–141097, 2021.
- TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In R. Barzilay and M.-Y. Kan, editors, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017. Association for Computational Linguistics.
- D. Kalpakchi and J. Boye. Quasi: a synthetic question-answering dataset in Swedish using GPT-3 and zero-shot learning. In T. Alumäe and M. Fishel, editors, Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 477–491, Tórshavn, Faroe Islands, May 2023. University of Tartu Library.
- Cross-lingual training for automatic question generation. In A. Korhonen, D. Traum, and L. Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4863–4872, Florence, Italy, July 2019. Association for Computational Linguistics.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466, 2019.
- RACE: Large-scale ReAding comprehension dataset from examinations. In M. Palmer, R. Hwa, and S. Riedel, editors, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics.
- S. Lee and M. Lee. Type-dependent prompt CycleQAG : Cycle consistency for multi-hop question generation. In N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, and S.-H. Na, editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 6301–6314, Gyeongju, Republic of Korea, Oct. 2022. International Committee on Computational Linguistics.
- The power of scale for parameter-efficient prompt tuning. In M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics.
- Harnessing the power of prompt-based techniques for generating school-level questions using large language models. In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE ’23, page 30–39, New York, NY, USA, 2024. Association for Computing Machinery.
- A novel multi-stage prompting approach for language agnostic mcq generation using gpt. In N. Goharian, N. Tonellotto, Y. He, A. Lipani, G. McDonald, C. Macdonald, and I. Ounis, editors, Advances in Information Retrieval, pages 268–277, Cham, 2024. Springer Nature Switzerland.
- Automatic generation of multiple-choice test items from paragraphs using deep neural networks. In Advancing Natural Language Processing in Educational Assessment, pages 77–89. Routledge, 2023.
- GermanQuAD and GermanDPR: Improving non-English question answering and passage retrieval. In A. Fisch, A. Talmor, D. Chen, E. Choi, M. Seo, P. Lewis, R. Jia, and S. Min, editors, Proceedings of the 3rd Workshop on Machine Reading for Question Answering, pages 42–50, Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.
- N. Mulla and P. Gharpure. Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1):1–32, 2023.
- JFLEG: A fluency corpus and benchmark for grammatical error correction. In M. Lapata, P. Blunsom, and A. Koller, editors, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 229–234, Valencia, Spain, Apr. 2017. Association for Computational Linguistics.
- Semantic graphs for generating deep questions. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1463–1475, Online, July 2020. Association for Computational Linguistics.
- The robots are here: Navigating the generative ai revolution in computing education. In Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education, ITiCSE-WGR ’23, page 108–159, New York, NY, USA, 2023. Association for Computing Machinery.
- Neural approaches to automated speech scoring of monologue and dialogue responses. In ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 8112–8116. IEEE, 2019.
- A prompt-aware neural network approach to content-based scoring of non-native spontaneous speech. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 979–986, 2018.
- Bangla grammar pattern recognition using shift reduce parser. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), pages 229–234, 2016.
- Robust speech recognition via large-scale weak supervision, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Know what you don’t know: Unanswerable questions for SQuAD. In I. Gurevych and Y. Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia, July 2018. Association for Computational Linguistics.
- M2D2: A massively multi-domain language modeling dataset. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 964–975, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics.
- End-to-end generation of multiple-choice questions using text-to-text transfer transformer models. Expert Systems with Applications, 208:118258, 2022.
- S. J. Ross. Interviewing for language proficiency. Springer, 2017.
- Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8:264–280, 2020.
- A. Rozovskaya and D. Roth. Grammar error correction in morphologically rich languages: The case of Russian. Transactions of the Association for Computational Linguistics, 7:1–17, 2019.
- Empowering education with generative artificial intelligence tools: Approach with an instructional design matrix. Sustainability, 15(15), 2023.
- The first question generation shared task evaluation challenge. In J. Kelleher, B. M. Namee, and I. v. d. Sluis, editors, Proceedings of the 6th International Natural Language Generation Conference. Association for Computational Linguistics, July 2010.
- What all do audio transformer models hear? probing acoustic representations for language delivery and its structure. arXiv preprint arXiv:2101.00387, 2021.
- Identifying the writing style of bangla language using natural language processing. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pages 1–6, 2020.
- Speaker-conditioned hierarchical modeling for automated speech scoring. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 1681–1691, 2021.
- A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27, pages 270–279. Springer, 2018.
- MSP: Multi-stage prompting for making pre-trained language models better translators. In S. Muresan, P. Nakov, and A. Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6131–6142, Dublin, Ireland, May 2022. Association for Computational Linguistics.
- Augmented SBERT: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. In K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 296–310, Online, June 2021. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- N. Travers and L.-S. Huang. Breaking Intangible Barriers in English-as-an-Additional-Language Job Interviews: Evidence from Interview Training and Ratings. Applied Linguistics, 42(4):641–667, 11 2020.
- NewsQA: A machine comprehension dataset. In P. Blunsom, A. Bordes, K. Cho, S. Cohen, C. Dyer, E. Grefenstette, K. M. Hermann, L. Rimell, J. Weston, and S. Yih, editors, Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191–200, Vancouver, Canada, Aug. 2017. Association for Computational Linguistics.
- Capturing greater context for question generation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):9065–9072, Apr. 2020.
- Generative language models for paragraph-level question generation. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 670–688, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc., 2022.
- HotpotQA: A dataset for diverse, explainable multi-hop question answering. In E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics.
- Content modeling for automated oral proficiency scoring system. In H. Yannakoudakis, E. Kochmar, C. Leacock, N. Madnani, I. Pilán, and T. Zesch, editors, Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 394–401, Florence, Italy, Aug. 2019. Association for Computational Linguistics.
- H. Yu. The application and challenges of chatgpt in educational transformation: New demands for teachers’ roles. Heliyon, 10(2):e24289, 2024.
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR, 2020.
- MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3118–3130, Seattle, United States, July 2022. Association for Computational Linguistics.
- Subhankar Maity (24 papers)
- Aniket Deroy (29 papers)
- Sudeshna Sarkar (19 papers)