Do LLMs Implicitly Determine the Suitable Text Difficulty for Users? (2402.14453v1)
Abstract: Education that suits the individual learning level is necessary to improve students' understanding. The first step in achieving this purpose by using LLMs is to adjust the textual difficulty of the response to students. This work analyzes how LLMs can implicitly adjust text difficulty between user input and its generated text. To conduct the experiments, we created a new dataset from Stack-Overflow to explore the performance of question-answering-based conversation. Experimental results on the Stack-Overflow dataset and the TSCC dataset, including multi-turn conversation show that LLMs can implicitly handle text difficulty between user input and its generated response. We also observed that some LLMs can surpass humans in handling text difficulty and the importance of instruction-tuning.
- Suha S Al-Thanyyan and Aqil M Azmi. 2021. Automated text simplification: a survey. ACM Computing Surveys (CSUR), 54(2):1–36.
- Risang Baskara et al. 2023. Exploring the implications of chatgpt for language learning in higher education. Indonesian Journal of English Language Teaching and Applied Linguistics, 7(2):343–358.
- The teacher-student chatroom corpus. In Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning, pages 10–20, Gothenburg, Sweden. LiU Electronic Press.
- Chih-Ming Chen and Ching-Ju Chung. 2008. Personalized mobile english vocabulary learning system based on item response theory and learning memory cycle. Computers & Education, 51(2):624–645.
- Reading comprehension quiz generation using generative pre-trained transformers.
- Enhancing chat language models by scaling high-quality instructional conversations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3029–3051, Singapore. Association for Computational Linguistics.
- Sentence simplification via large language models. arXiv preprint arXiv:2302.11957.
- Quiz maker: Automatic quiz generation from text using nlp. In Futuristic Trends in Networks and Computing Technologies: Select Proceedings of Fourth International Conference on FTNCT 2021, pages 523–533. Springer.
- A heuristic algorithm for planning personalized learning paths for context-aware ubiquitous learning. Computers & Education, 54(2):404–415.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
- George R Klare. 1974. Assessing readability. Reading research quarterly, pages 62–102.
- Bruce W Lee and Jason Hyung-Jong Lee. 2023. Traditional readability formulas compared for english. arXiv preprint arXiv:2301.02975.
- G Harry Mc Laughlin. 1969. Smog grading-a new readability formula. Journal of reading, 12(8):639–646.
- Orca 2: Teaching small language models how to reason. arXiv preprint arXiv:2311.11045.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
- SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
- Know your audience: Do llms adapt to different age and education levels? arXiv preprint arXiv:2312.02065.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Openchat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235.
- Trends and development in technology-enhanced adaptive/personalized learning: A systematic review of journal publications from 2007 to 2017. Computers & Education, 140:103599.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
- Starling-7b: Improving llm helpfulness & harmlessness with rlaif.
- Fine-tuning language models with advantage-induced policy alignment. arXiv preprint arXiv:2306.02231.