Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparison of Large Language Models for Generating Contextually Relevant Questions (2407.20578v2)

Published 30 Jul 2024 in cs.CL, cs.AI, and cs.CY

Abstract: This study explores the effectiveness of LLMs for Automatic Question Generation in educational settings. Three LLMs are compared in their ability to create questions from university slide text without fine-tuning. Questions were obtained in a two-step pipeline: first, answer phrases were extracted from slides using Llama 2-Chat 13B; then, the three models generated questions for each answer. To analyze whether the questions would be suitable in educational applications for students, a survey was conducted with 46 students who evaluated a total of 246 questions across five metrics: clarity, relevance, difficulty, slide relation, and question-answer alignment. Results indicate that GPT-3.5 and Llama 2-Chat 13B outperform Flan T5 XXL by a small margin, particularly in terms of clarity and question-answer alignment. GPT-3.5 especially excels at tailoring questions to match the input answers. The contribution of this research is the analysis of the capacity of LLMs for Automatic Question Generation in education.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ivo Lodovico Molina (1 paper)
  2. Valdemar Švábenský (34 papers)
  3. Tsubasa Minematsu (2 papers)
  4. Li Chen (590 papers)
  5. Fumiya Okubo (7 papers)
  6. Atsushi Shimada (12 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com