Enhancing Multi-Domain Automatic Short Answer Grading through an Explainable Neuro-Symbolic Pipeline (2403.01811v2)
Abstract: Grading short answer questions automatically with interpretable reasoning behind the grading decision is a challenging goal for current transformer approaches. Justification cue detection, in combination with logical reasoners, has shown a promising direction for neuro-symbolic architectures in ASAG. But, one of the main challenges is the requirement of annotated justification cues in the students' responses, which only exist for a few ASAG datasets. To overcome this challenge, we contribute (1) a weakly supervised annotation procedure for justification cues in ASAG datasets, and (2) a neuro-symbolic model for explainable ASAG based on justification cues. Our approach improves upon the RMSE by 0.24 to 0.3 compared to the state-of-the-art on the Short Answer Feedback dataset in a bilingual, multi-domain, and multi-question training setup. This result shows that our approach provides a promising direction for generating high-quality grades and accompanying explanations for future research in ASAG and educational NLP.
- Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72.
- The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1):60–117.
- Leon Camus and Anna Filighera. 2020. Investigating transformers for automatic short answer grading. Artificial Intelligence in Education, 12164:43 – 48.
- Your answer is incorrect… would you like to know why? introducing a bilingual short answer feedback dataset. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8577–8591, Dublin, Ireland. Association for Computational Linguistics.
- Lucas Busatta Galhardi and Jacques Duílio Brancher. 2018. Machine learning approach for automatic short answer grading: A systematic review. In Advances in Artificial Intelligence - IBERAMIA 2018, pages 380–391, Cham. Springer International Publishing.
- Survey on automated short answer grading with deep learning: from word embeddings to transformers.
- Deberta: Decoding-enhanced BERT with disentangled attention.
- Henry Kautz. 2022. The third ai summer: AAAI Robert S. Engelmore Memorial lecture. AI Magazine, 43(1):105–125.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- skweak: Weak supervision made easy for NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 337–346, Online. Association for Computational Linguistics.
- Multilingual denoising pre-training for neural machine translation.
- Analytic score prediction and justification identification in automated short answer scoring. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 316–325, Florence, Italy. Association for Computational Linguistics.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Andrew Poulton and Sebas Eliens. 2021. Explaining transformer-based models for automatic short answer grading. In Proceedings of the 5th International Conference on Digital Technology in Education, pages 110–116.
- Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108.
- Deep learning techniques for automatic short answer grading: Predicting scores for english and german answers. In Artificial Intelligence in Education: Emerging Technologies, Models and Applications: Proceedings of 2021 2nd International Conference on Artificial Intelligence in Education Technology, pages 65–75. Springer.
- Explainability in automatic short answer grading. In Artificial Intelligence in Education Technologies: New Development and Innovative Practices, pages 69–87, Singapore. Springer Nature Singapore.
- Valerie J. Shute. 2008. Focus on formative feedback. Review of Educational Research, 78(1):153–189.
- Shunya Takano and Osamu Ichikawa. 2022. Automatic scoring of short answers using justification cues estimated by BERT. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 8–13, Seattle, Washington. Association for Computational Linguistics.
- Inject rubrics into short answer grading system. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 175–182, Hong Kong, China. Association for Computational Linguistics.
- Supporting learners’ agentic engagement with feedback: A systematic review and a taxonomy of recipience processes. Educational Psychologist, 52(1):17–37.
- The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10.
- A survey on programmatic weak supervision.
- Bertscore: Evaluating text generation with BERT. CoRR, abs/1904.09675.
- Automatic patient note assessment without strong supervision. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 116–126, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.