2000 character limit reached
CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model (2403.10326v1)
Published 15 Mar 2024 in cs.CL, cs.AI, and cs.LG
Abstract: Manually designing cloze test consumes enormous time and efforts. The major challenge lies in wrong option (distractor) selection. Having carefully-design distractors improves the effectiveness of learner ability assessment. As a result, the idea of automatically generating cloze distractor is motivated. In this paper, we investigate cloze distractor generation by exploring the employment of pre-trained LLMs (PLMs) as an alternative for candidate distractor generation. Experiments show that the PLM-enhanced model brings a substantial performance improvement. Our best performing model advances the state-of-the-art result from 14.94 to 34.17 (NDCG@10 score). Our code and dataset is available at https://github.com/AndyChiangSH/CDGP.
- Scibert: A pretrained language model for scientific text. In EMNLP. Association for Computational Linguistics.
- Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
- Automatic generation of cloze question distractors. In Second language studies: acquisition, learning, education and technology.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
- Revup: Automatic gap-fill question generation from educational texts. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 154–161.
- John Lee and Stephanie Seneff. 2007. Automatic generation of cloze items for prepositions. In Eighth Annual Conference of the International Speech Communication Association. Citeseer.
- BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. CoRR, abs/1910.13461.
- Distractor generation for multiple choice questions using learning to rank. In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications, pages 284–290.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- George A Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41.
- Automatic cloze-questions generation. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pages 511–515.
- Siyu Ren and Kenny Q. Zhu. 2021. Knowledge-driven distractor generation for cloze-style multiple choice questions. Proceedings of the AAAI Conference on Artificial Intelligence, 35(5):4339–4347.
- Crowdsourcing multiple choice science questions. arXiv preprint arXiv:1707.06209.
- Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 481–492.
- Large-scale cloze test dataset created by teachers. arXiv preprint arXiv:1711.03225.