Large Language Model based Situational Dialogues for Second Language Learning (2403.20005v1)
Abstract: In second language learning, scenario-based conversation practice is important for language learners to achieve fluency in speaking, but students often lack sufficient opportunities to practice their conversational skills with qualified instructors or native speakers. To bridge this gap, we propose situational dialogue models for students to engage in conversational practice. Our situational dialogue models are fine-tuned on LLMs, with the aim of combining the engaging nature of an open-ended conversation with the focused practice of scenario-based tasks. Leveraging the generalization capabilities of LLMs, we demonstrate that our situational dialogue models perform effectively not only on training topics but also on topics not encountered during training. This offers a promising solution to support a wide range of conversational topics without extensive manual work. Additionally, research in the field of dialogue systems still lacks reliable automatic evaluation metrics, leading to human evaluation as the gold standard (Smith et al., 2022), which is typically expensive. To address the limitations of existing evaluation methods, we present a novel automatic evaluation method that employs fine-tuned LLMs to efficiently and effectively assess the performance of situational dialogue models.
- Qwen technical report.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Grammatical error correction: A survey of the state of the art. Computational Linguistics, 49(3):643–701.
- End-to-end neural network based automated speech scoring. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6234–6238. IEEE.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Robert M. DeKeyser and Robert DeKeyser. 2007. Study abroad as foreign language practice, Cambridge Applied Linguistics, page 208–226. Cambridge University Press.
- Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st conference on computational natural language learning (CoNLL 2017), pages 153–162.
- An interactive dialog system for learning japanese. Speech Communication, 30(2):167–177.
- Approximating interactive human evaluation with self-play for open-domain dialog systems. Advances in Neural Information Processing Systems, 32.
- A chatbot for a dialogue-based second language learning system. CALL in a climate of change: adapting to turbulent global conditions–short papers from EUROCALL, pages 151–156.
- Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274.
- Task graph based task-oriented dialogue system using dialogue map for second language learning. Future-proof CALL: language learning as exploration and encounters–short papers from EUROCALL 2018, pages 153–159.
- Prompted LLMs as chatbot modules for long open-domain conversation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4536–4554, Toronto, Canada. Association for Computational Linguistics.
- Prompted llms as chatbot modules for long open-domain conversation. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics.
- Evaluating human-language model interaction.
- Developing a task-based dialogue system for english language learning. Education Sciences, 10(11):306.
- Acute-eval: Improved dialogue evaluation with optimized questions and multi-turn comparisons. arXiv preprint arXiv:1909.03087.
- Using chatbots to teach languages. In Proceedings of the Ninth ACM Conference on Learning @ Scale, L@S ’22. ACM.
- Using chatbots to teach languages. In Proceedings of the Ninth ACM Conference on Learning@ Scale, pages 451–455.
- Gunrock 2.0: A user adaptive social conversational system. arXiv preprint arXiv:2011.08906.
- How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023.
- Towards an automatic turing test: Learning to evaluate dialogue responses. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1116–1126.
- David Nunan. 2004. Task-based language teaching. Cambridge university press.
- OpenAI. 2023. Chatgpt: Optimizing language models for dialogue. https://chat.openai.com. Accessed: 2023-12-01.
- OpenAI. 2023. Gpt-4 technical report.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents. In Proceedings of the 4th Workshop on NLP for Conversational AI, pages 77–97, Dublin, Ireland. Association for Computational Linguistics.
- Kaveh Taghipour and Hwee Tou Ng. 2016. A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 1882–1891.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Theories in second language acquisition: An introduction. Routledge.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Erroneous data generation for grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 149–158.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Fireball: A dataset of dungeons and dragons actual-play with structured game state information. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4171–4193, Toronto, Canada. Association for Computational Linguistics.
- Shuyao Xu (2 papers)
- Long Qin (9 papers)
- Tianyang Chen (3 papers)
- Zhenzhou Zha (2 papers)
- Bingxue Qiu (3 papers)
- Weizhi Wang (18 papers)