GenQuest: LLM-Driven Adaptive Quest Generation

Updated 15 November 2025

GenQuest is a system that generates dynamic, branching text-adventures using large language models to create adaptive narrative quests tailored for language learning.
It integrates a narrative LLM for plot generation with a vocabulary assistant LLM to provide proficiency-based vocabulary explanations, ensuring semantic coherence.
Empirical evaluation shows improved vocabulary outcomes and high user satisfaction, while highlighting opportunities for multilingual and multimodal enhancements.

GenQuest refers to a class of systems and methodologies leveraging LLMs for the generation of dynamic, branching, and context-adaptive quests in text-adventure environments. Originating from research on procedural quest generation and reaching recent application in language learning contexts, GenQuest synthesizes story-driven interaction, semantic coherence, and adaptive pedagogy. Its system architecture tightly integrates LLM-based narrative engines, proficiency adaptation, and auxiliary vocabulary support, establishing a model for both entertainment and educational domains.

1. System Design and Architectural Overview

GenQuest is implemented as a multi-modal web application with a modular architecture. The frontend is developed using Vue.js, providing genre selection, proficiency-level text variants (sampled according to CEFR levels), interactive decision-point menus, and an interface for querying vocabulary explanations. The Python Flask-based backend maintains a persistent memory store including the user's CEFR selection, a structured narrative outline (Composed of Milestones $M_i$ , Decision Points %%%%1%%%%, and possible Endings $E$ ), and a running summary of previous plot segments.

Two commercial LLMs are used in tandem:

Story Generation LLM (Claude Sonnet 3.7):
- Handles text sampling, outline generation, plot continuation, and summarization.
- Generates parallel story beginnings at CEFR levels A1–C2, enabling the learner to select their preferred version.
- Produces structured outlines:
$\{ M_0, D_1 : [opt_1, ..., opt_{k_1}], M_1, ..., E \}$ - At each plot continuation step $t$ , constructs a context $c_t$ from outline, summaries, and last decision, and generates a segment $y_t = \operatorname{LM_{plot}}(c_t)$ .
Vocabulary Assistant LLM (GPT-4o):
- For highlighted text strings $x$ , provides in-context, proficiency-tailored explanations $y_{\text{vocab}} = \operatorname{LM_{vocab}}(x, \text{context}, \text{CEFR level})$ .

High-level pseudocode illustrates the workflow:

user_genre, user_hint = frontend
{x_l}_l=A1...C2 = Claude3.7.sample_proficiency(user_genre, user_hint)
user_level = frontend.select({levels A1...C2})
outline = Claude3.7.generate_outline(user_genre, user_level)
memory = {user_level, outline, summaries=[]}

i = 0
while not at_final_milestone(memory.outline):
    c_t = concat(memory, last_user_choice)
    y_t = Claude3.7.generate_plot(c_t)
    frontend.display(y_t)
    s_t = Claude3.7.summarize(y_t)
    append(memory.summaries, s_t)
    options = extract_decision_options(y_t)
    last_user_choice = frontend.select(options)
    i += 1

c_end = memory ∪ last_user_choice
y_end = Claude3.7.generate_ending(c_end)
frontend.display(y_end)

2. Narrative and Game Mechanics

GenQuest employs a “converging checkpoint” structure for dynamic narrative assembly—each outline consists of Milestones $M_0, ..., M_N$ interleaved with branching Decision Points $D_j$ , where each $D_j$ offers $k_j$ plot options. Branches reconverge at the next milestone, guaranteeing narrative coherence while enabling user-driven plot development.

Branching probability at $D_j$ is parameterized as $P_b(j) = 1 - 1/k_j$ , with the backend controlling $k_j$ per outline. After each segment, only the $K$ most recent summaries are retained ( $K \approx 5-7$ ), limiting context length while maintaining coherence. Optionally, a relevance function $R(c, y) = \cos(\operatorname{sim}(\operatorname{embed}(c)), \operatorname{embed}(y))$ ensures plot alignment with prior context, with automated filtering when $R < \tau$ .

3. Proficiency Adaptation and Pedagogical Foundation

The system adapts its linguistic complexity via two mechanisms:

CEFR-Level Sampling: Claude 3.7 is prompted: “Generate six versions of the opening scene in simple English (A1), elementary (A2), ... up to proficient (C2).” For $L = \{\text{A1}, \ldots, \text{C2}\}$ and each level $\ell \in L$ , generate $x_\ell = \operatorname{LM_{sample}}(\text{prompt}, \text{complexity}=\ell)$ . This maps proficiency constraints onto vocabulary frequency and syntactic complexity, guided by CEFR descriptors in prompts.
Vocabulary Assistant: For any highlighted string $s$ , the frontend sends $(s, \text{context}, \text{user\_level})$ to GPT-4o, which returns candidate explanations $\{e_1, ..., e_n\}$ , ranked via $\text{Score}(e_i) = \alpha \cdot \cos(\operatorname{embed}(e_i), \operatorname{embed}(\text{context})) - \beta \cdot \text{perplexity}_{LM}(e_i)$ . $\alpha$ and $\beta$ are tuned for contextual relevance and linguistic simplicity. Explanations align with the target CEFR level, and each $(s,e)$ pair is persistently stored for learner review.

4. Empirical Evaluation: Pilot Study Analysis

A five-day within-subject pilot with nine Chinese undergraduates (CEFR A2–B1) assessed vocabulary acquisition and user perception. Each session involved a full quest playthrough and logged vocabulary queries.

Vocabulary Test: After five days, top-queried words were consolidated into a 20-item vocabulary test. Scoring: 1 = correct, 0.5 = partial, 0 = incorrect. Mean score: $M = 13.44$ ( $\approx 67\%$ ), $SD = 4.62$ , range: 6–18.5. Higher CEFR selections correlated with slightly higher scores, but significant individual variation observed.
User Perception Survey (TAM): Twelve items (six Perceived Usefulness, six Perceived Ease of Use) on 7-point scales. PU mean: 5.33 ( $SD = 1.02$ , $\alpha = 0.92$ ); PEOU mean: 5.85 ( $SD = 0.81$ , $\alpha = 0.84$ ). Highest ratings: “The game interface is easy to understand” ( $M = 6.00$ ); “I find the game easy to use overall” ( $M = 6.44$ ).

Open-ended responses (N=9) were thematically coded:

Vocabulary support (5): Need for L1 (Chinese) glosses, collocations/multiple senses, usage examples, spaced repetition.
Narrative quality (3): Logical inconsistencies, excessive passage length, better results with user prompts.
User experience (3): Slow loading (>1 min), interface dryness, request for illustrations, desire for puzzle-like interactivity.
Difficulty adjustment (2): Finer CEFR granularity, advocate for pre-test to auto-assign optimal level.

This suggests a need for multilingual scaffolding, improved narrative consistency, multimodal content, more granular proficiency mapping, and greater interactivity.

6. Limitations and Future Directions

GenQuest demonstrates efficacy for incidental vocabulary acquisition and user motivation. Key areas for improvement include:

Enhanced CEFR alignment (lexical filters, explicit readability metrics such as LIX or Flesch–Kincaid).
Multimodal capability (integrating image models, handling consistency).
Bilingual scaffolding (L1 glosses, integrated dictionaries).
Expanded assessment (in-game quizzes, comprehension checks).

Future research will expand participant diversity, enable longer-term learning outcome tracking, and adapt the architecture to other target languages with appropriate base models. The existence of strong vocabulary gains ( $\approx 67\%$ correct), high user satisfaction, and substantive learner suggestions establishes GenQuest as a reference system for LLM-driven text adventure applications in second language acquisition contexts (Wang et al., 6 Oct 2025).

Markdown Upgrade to Chat

References (1)

GenQuest: An LLM-based Text Adventure Game for Language Learners (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GenQuest.

GenQuest: LLM-Driven Adaptive Quest Generation

1. System Design and Architectural Overview

2. Narrative and Game Mechanics

3. Proficiency Adaptation and Pedagogical Foundation

4. Empirical Evaluation: Pilot Study Analysis

5. Qualitative Feedback and System Refinement

6. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

GenQuest: LLM-Driven Adaptive Quest Generation

1. System Design and Architectural Overview

2. Narrative and Game Mechanics

3. Proficiency Adaptation and Pedagogical Foundation

4. Empirical Evaluation: Pilot Study Analysis

5. Qualitative Feedback and System Refinement

6. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics