Papers
Topics
Authors
Recent
2000 character limit reached

Reading Comprehension Exercise Generation

Updated 1 December 2025
  • RCEG is defined as the automated creation of reading comprehension tasks such as questions, answers, and distractors, enhancing literacy assessment.
  • It leverages transformer-based models and modular pipelines to control skill, difficulty, and content coverage across multiple exercise types.
  • Current trends address personalization, multilingual support, and rigorous evaluation using automated metrics and human expert review.

Reading Comprehension Exercise Generation (RCEG) comprises the automated creation of reading comprehension tasks—including questions, answers, and distractors—given input passages. These tasks target assessment, instruction, and research in literacy and language learning. Over the past decade, RCEG has advanced from pattern-based pipelines to transformer-based LLMs with controllability for skill, difficulty, and content coverage, supporting open-ended, fill-in-the-blank, and multiple-choice formats across diverse languages and reading levels.

1. Problem Formulation and Scope

RCEG is formally defined as the process of mapping an input document or passage DD to a set of reading comprehension exercises Q={q1,,qK}Q = \{q_1, \dots, q_K\}, where qiq_i may be a question-answer pair, a multiple-choice question (MCQ), or another test item type, together with supporting distractors when required. Formally, the system is required to maximize several joint objectives: coverage of key content elements, diversity of question types, appropriateness of difficulty or skill calibration, as well as syntactic, semantic, and pedagogical quality (Yang et al., 30 Jul 2025, Huang et al., 24 Nov 2025).

Recent frameworks express this as a modular, skill- and difficulty-conditioned sequence-to-sequence problem, often parameterized as pθ(qC,s,a,)p_\theta(q \mid C, s, a, \ell), where CC is the context, ss a comprehension skill, aa an answer (if specified), and \ell a difficulty level (Wang et al., 2023, Kumar et al., 2023, Wang et al., 2023).

2. Model Architectures and Generation Pipelines

RCEG systems follow a variety of architectures depending on subtask specialization, as tabulated below:

Subsystem Model Examples Key Mechanisms
Question Generation T5, Flan-T5, BART, Llama Seq2Seq, transformer, answer conditioning, skill/difficulty prompts (Kumar et al., 2023, Yang et al., 30 Jul 2025, Wang et al., 2023)
Answer Generation Extractive or generative QA Pointer networks, span prediction (Kumar et al., 2018)
Distractor Generation Hierarchical encoder-decoder, GPT, PLMs Static/dynamic attention, mask-based decoding, knowledge-based ranking (Lin et al., 29 May 2024, Gao et al., 2018, Zhang, 2023)
Exercise Selection Discriminator, overgenerate-and-rank Perplexity/DM rankers, reward models (Huang et al., 24 Nov 2025, Kumar et al., 2023)

A prototypical pipeline consists of:

  1. Preprocessing and Content Selection: Tokenization, semantic/syntactic tagging, and content segmentation for summarization and coverage (Yang et al., 30 Jul 2025, Zhang, 2023).
  2. Candidate Generation: Fine-tuned transformer models generate questions, answers, and distractors under controlled prompts for skill, difficulty, and type (Kumar et al., 2023, Wang et al., 2023, Lin et al., 29 May 2024).
  3. Filtering and Selection: Overgenerate-and-rank frameworks sample multiple candidates and apply scoring models to select high-quality, pedagogically aligned items (Kumar et al., 2023, Huang et al., 24 Nov 2025).
  4. Post-hoc Control and Filtering: Dynamic attribute graph (DATG) reweighting, GeDi-based toxicity filtering, and heuristics for answer-in-question, length, and answerability (Huang et al., 24 Nov 2025, Zhang, 2023).
  5. Output Integration: Assembling validated (question, correct answer, distractors) sets for end-use (Zhang, 2023, Lin et al., 29 May 2024).

3. Exercise Types, Skill and Difficulty Control

RCEG covers a spectrum of exercise types:

  • Literal, Inferential, and Bridging-Inference Question Generation: Classification by the type of cognitive operation required (e.g., retrieval, gap-filling, reference resolution) (Ma et al., 9 Jun 2025, Ghanem et al., 2022).
  • Skill-Conditioned Generation: Systems such as SkillQG (Wang et al., 2023) and HTA-WTA (Ghanem et al., 2022) enforce targeting of Bloom’s taxonomy-derived skills or story-based categories by including explicit skill tokens and stepwise prompting for question focus and background knowledge.
  • Difficulty Controllability: Fine-grained control of difficulty is achieved by tailored prompts, question templates, or supervised learning with difficulty labels, especially in multi-level educational contexts (Gao et al., 2018, Yang et al., 30 Jul 2025).

Difficulty and skill conditioning is operationalized via:

4. Distractor Generation and MCQ Expansion

Distractor generation for MCQs is addressed via:

  • Hierarchical Encoder–Decoder Networks: Systems model both sentence- and word-level dependencies to generate semantically plausible, distractor options, leveraging global and static attention to avoid answer overlap and promote contextual relevance (Gao et al., 2018).
  • Mask-based and Multi-task Learning (DGRC): Hard chain-of-thought reasoning, sequential and end-to-end mask decoding, and multi-task fine-tuning yield significant performance improvement, especially for context-sensitive, exam-style distractors (Lin et al., 29 May 2024).
  • Hybrid NLP and Knowledge Approaches: Lexical, semantic, and named-entity-based candidate gathering and scoring, including knowledge base lookups, embedding similarities, and edit distance heuristics (Zhang, 2023).
  • Filtering and Diversity Enforcement: Jaccard distance, distractor order shuffling, and ranking mechanisms to maximize diversity and plausibility (Lin et al., 29 May 2024, Gao et al., 2018).

5. Evaluation Metrics and Experimental Protocols

Quality assessment in RCEG integrates both automatic and human evaluation protocols:

Representative results:

Contemporary RCEG research emphasizes:

Notable limitations include incomplete skill/inference type alignment (e.g., only 42.6% inference-type match in automatic bridging-inference QG (Ma et al., 9 Jun 2025)), dependence on strong pretrained LLMs, and sensitivity to prompt engineering or data domain shift. Methods to directly optimize text informativity and reduce guessability (e.g., integrating TI as a reinforcement learning reward) are proposed as future enhancements (Säuberli et al., 11 Apr 2024).

Advances in dynamic coverage optimization, student modeling, chain-of-thought prompting, and domain adaptation will further refine automated RCEG, facilitating scalable, effective literacy assessment in multi-modal, multi-lingual, and adaptive learning environments.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Reading Comprehension Exercise Generation (RCEG).