Japanese-English Sentence Translation Exercises Dataset for Automatic Grading (2403.03396v1)
Abstract: This paper proposes the task of automatic assessment of Sentence Translation Exercises (STEs), that have been used in the early stage of L2 language learning. We formalize the task as grading student responses for each rubric criterion pre-specified by the educators. We then create a dataset for STE between Japanese and English including 21 questions, along with a total of 3, 498 student responses (167 on average). The answer responses were collected from students and crowd workers. Using this dataset, we demonstrate the performance of baselines including finetuned BERT and GPT models with few-shot in-context learning. Experimental results show that the baseline model with finetuned BERT was able to classify correct responses with approximately 90% in F1, but only less than 80% for incorrect responses. Furthermore, the GPT models with few-shot learning show poorer results than finetuned BERT, indicating that our newly proposed task presents a challenging issue, even for the stateof-the-art LLMs.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1):60–117.
- Wolfgang Butzkamm and John Caldwell. 2009. The Bilingual Reform. A Paradigm shift in Foreign Language Teaching. Narr Dr. Gunter.
- Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
- Guy Cook. 2010. Translation in Language Teaching: An Argument for Reassessment. Oxford University Press, Oxford.
- Steven Coyne. 2023. Template-guided grammatical error feedback comment generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 94–104, Dubrovnik, Croatia. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. pages 263–274.
- Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation.
- Your answer is incorrect… would you like to know why? introducing a bilingual short answer feedback dataset. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8577–8591, Dublin, Ireland. Association for Computational Linguistics.
- Reducing the cost: Cross-Prompt pre-finetuning for short answer scoring. In Artificial Intelligence in Education, pages 78–89. Springer Nature Switzerland.
- Lucas Busatta Galhardi and Jacques DuÃlio Brancher. 2018. Machine learning approach for automatic short answer grading: A systematic review. In Advances in Artificial Intelligence - IBERAMIA 2018, pages 380–391. Springer International Publishing.
- Predicting perfect quality segments in mt output with fine-tuned openai llm: Is it possible to capture editing distance patterns from historical data?
- Exploring methods for generating feedback comments for writing learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9719–9730, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Medical reports summarization using text-to-text transformer. In 2023 Advances in Science and Engineering Technology International Conferences (ASET), pages 01–04. IEEE.
- LoRA: Low-Rank adaptation of large language models.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization.
- Svetlana Koltovskaia. 2020. Student engagement with automated written corrective feedback (AWCF) provided by grammarly: A multiple case study. Assessing Writing, 44:100450.
- Yi-Huei Lai and Jason Chang. 2019. TellMeWhy: Learning to explain corrective feedback for second language learners. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, pages 235–240, Hong Kong, China. Association for Computational Linguistics.
- J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
- Diane Larsen-Freeman. 2012. On the roles of repetition in language teaching and learning. Applied Linguistics Review, 3(2):195–210.
- Analytic score prediction and justification identification in automated short answer scoring. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 316–325, Florence, Italy. Association for Computational Linguistics.
- Ryo Nagata. 2019. Toward a task of feedback comment generation for writing learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3206–3215, Hong Kong, China. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report.
- Jim Ranalli. 2021. L2 student engagement with automated feedback on writing: Potential for learning and issues of trust. Journal of Second Language Writing, 52:100816.
- Plausibility and faithfulness of feature Attribution-Based explanations in automated short answer scoring. In Artificial Intelligence in Education, pages 231–242. Springer International Publishing.
- Virginia Scott and MarÃa De la Fuente. 2008. What’s the problem? l2 learners’ use of the l1 during consciousness-raising, form-focused tasks. The Modern Language Journal, 92:100 – 113.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Muhammet Yasar Yuzlu and Kenan Dikilitas. 2022. Translanguaging in the development of efl learners’ foreign language skills in turkish context. Innovation in Language Learning and Teaching, 16(2):176–190.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.