Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain (2210.06104v1)

Published 12 Oct 2022 in cs.CL

Abstract: We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines will be released to support further research in question generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Amir Hadifar (5 papers)
  2. Semere Kiros Bitew (7 papers)
  3. Johannes Deleu (29 papers)
  4. Chris Develder (59 papers)
  5. Thomas Demeester (76 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.