Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

GenQuest: LLM-Driven Adaptive Quest Generation

Updated 15 November 2025
  • GenQuest is a system that generates dynamic, branching text-adventures using large language models to create adaptive narrative quests tailored for language learning.
  • It integrates a narrative LLM for plot generation with a vocabulary assistant LLM to provide proficiency-based vocabulary explanations, ensuring semantic coherence.
  • Empirical evaluation shows improved vocabulary outcomes and high user satisfaction, while highlighting opportunities for multilingual and multimodal enhancements.

GenQuest refers to a class of systems and methodologies leveraging LLMs for the generation of dynamic, branching, and context-adaptive quests in text-adventure environments. Originating from research on procedural quest generation and reaching recent application in language learning contexts, GenQuest synthesizes story-driven interaction, semantic coherence, and adaptive pedagogy. Its system architecture tightly integrates LLM-based narrative engines, proficiency adaptation, and auxiliary vocabulary support, establishing a model for both entertainment and educational domains.

1. System Design and Architectural Overview

GenQuest is implemented as a multi-modal web application with a modular architecture. The frontend is developed using Vue.js, providing genre selection, proficiency-level text variants (sampled according to CEFR levels), interactive decision-point menus, and an interface for querying vocabulary explanations. The Python Flask-based backend maintains a persistent memory store including the user's CEFR selection, a structured narrative outline (Composed of Milestones MiM_i, Decision Points DjD_j, and possible Endings EE), and a running summary of previous plot segments.

Two commercial LLMs are used in tandem:

  • Story Generation LLM (Claude Sonnet 3.7):
    • Handles text sampling, outline generation, plot continuation, and summarization.
    • Generates parallel story beginnings at CEFR levels A1–C2, enabling the learner to select their preferred version.
    • Produces structured outlines:

    {M0,D1:[opt1,...,optk1],M1,...,E}\{ M_0, D_1 : [opt_1, ..., opt_{k_1}], M_1, ..., E \} - At each plot continuation step tt, constructs a context ctc_t from outline, summaries, and last decision, and generates a segment yt=LMplot(ct)y_t = \operatorname{LM_{plot}}(c_t).

  • Vocabulary Assistant LLM (GPT-4o):

    • For highlighted text strings xx, provides in-context, proficiency-tailored explanations yvocab=LMvocab(x,context,CEFR level)y_{\text{vocab}} = \operatorname{LM_{vocab}}(x, \text{context}, \text{CEFR level}).

High-level pseudocode illustrates the workflow:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
user_genre, user_hint = frontend
{x_l}_l=A1...C2 = Claude3.7.sample_proficiency(user_genre, user_hint)
user_level = frontend.select({levels A1...C2})
outline = Claude3.7.generate_outline(user_genre, user_level)
memory = {user_level, outline, summaries=[]}

i = 0
while not at_final_milestone(memory.outline):
    c_t = concat(memory, last_user_choice)
    y_t = Claude3.7.generate_plot(c_t)
    frontend.display(y_t)
    s_t = Claude3.7.summarize(y_t)
    append(memory.summaries, s_t)
    options = extract_decision_options(y_t)
    last_user_choice = frontend.select(options)
    i += 1

c_end = memory  last_user_choice
y_end = Claude3.7.generate_ending(c_end)
frontend.display(y_end)

2. Narrative and Game Mechanics

GenQuest employs a “converging checkpoint” structure for dynamic narrative assembly—each outline consists of Milestones M0,...,MNM_0, ..., M_N interleaved with branching Decision Points DjD_j, where each DjD_j offers kjk_j plot options. Branches reconverge at the next milestone, guaranteeing narrative coherence while enabling user-driven plot development.

Branching probability at DjD_j is parameterized as Pb(j)=11/kjP_b(j) = 1 - 1/k_j, with the backend controlling kjk_j per outline. After each segment, only the KK most recent summaries are retained (K57K \approx 5-7), limiting context length while maintaining coherence. Optionally, a relevance function R(c,y)=cos(sim(embed(c)),embed(y))R(c, y) = \cos(\operatorname{sim}(\operatorname{embed}(c)), \operatorname{embed}(y)) ensures plot alignment with prior context, with automated filtering when R<τR < \tau.

3. Proficiency Adaptation and Pedagogical Foundation

The system adapts its linguistic complexity via two mechanisms:

  • CEFR-Level Sampling: Claude 3.7 is prompted: “Generate six versions of the opening scene in simple English (A1), elementary (A2), ... up to proficient (C2).” For L={A1,,C2}L = \{\text{A1}, \ldots, \text{C2}\} and each level L\ell \in L, generate x=LMsample(prompt,complexity=)x_\ell = \operatorname{LM_{sample}}(\text{prompt}, \text{complexity}=\ell). This maps proficiency constraints onto vocabulary frequency and syntactic complexity, guided by CEFR descriptors in prompts.
  • Vocabulary Assistant: For any highlighted string ss, the frontend sends (s,context,user_level)(s, \text{context}, \text{user\_level}) to GPT-4o, which returns candidate explanations {e1,...,en}\{e_1, ..., e_n\}, ranked via Score(ei)=αcos(embed(ei),embed(context))βperplexityLM(ei)\text{Score}(e_i) = \alpha \cdot \cos(\operatorname{embed}(e_i), \operatorname{embed}(\text{context})) - \beta \cdot \text{perplexity}_{LM}(e_i). α\alpha and β\beta are tuned for contextual relevance and linguistic simplicity. Explanations align with the target CEFR level, and each (s,e)(s,e) pair is persistently stored for learner review.

4. Empirical Evaluation: Pilot Study Analysis

A five-day within-subject pilot with nine Chinese undergraduates (CEFR A2–B1) assessed vocabulary acquisition and user perception. Each session involved a full quest playthrough and logged vocabulary queries.

  • Vocabulary Test: After five days, top-queried words were consolidated into a 20-item vocabulary test. Scoring: 1 = correct, 0.5 = partial, 0 = incorrect. Mean score: M=13.44M = 13.44 (67%\approx 67\%), SD=4.62SD = 4.62, range: 6–18.5. Higher CEFR selections correlated with slightly higher scores, but significant individual variation observed.
  • User Perception Survey (TAM): Twelve items (six Perceived Usefulness, six Perceived Ease of Use) on 7-point scales. PU mean: 5.33 (SD=1.02SD = 1.02, α=0.92\alpha = 0.92); PEOU mean: 5.85 (SD=0.81SD = 0.81, α=0.84\alpha = 0.84). Highest ratings: “The game interface is easy to understand” (M=6.00M = 6.00); “I find the game easy to use overall” (M=6.44M = 6.44).

5. Qualitative Feedback and System Refinement

Open-ended responses (N=9) were thematically coded:

  • Vocabulary support (5): Need for L1 (Chinese) glosses, collocations/multiple senses, usage examples, spaced repetition.
  • Narrative quality (3): Logical inconsistencies, excessive passage length, better results with user prompts.
  • User experience (3): Slow loading (>1 min), interface dryness, request for illustrations, desire for puzzle-like interactivity.
  • Difficulty adjustment (2): Finer CEFR granularity, advocate for pre-test to auto-assign optimal level.

This suggests a need for multilingual scaffolding, improved narrative consistency, multimodal content, more granular proficiency mapping, and greater interactivity.

6. Limitations and Future Directions

GenQuest demonstrates efficacy for incidental vocabulary acquisition and user motivation. Key areas for improvement include:

  • Enhanced CEFR alignment (lexical filters, explicit readability metrics such as LIX or Flesch–Kincaid).
  • Multimodal capability (integrating image models, handling consistency).
  • Bilingual scaffolding (L1 glosses, integrated dictionaries).
  • Expanded assessment (in-game quizzes, comprehension checks).

Future research will expand participant diversity, enable longer-term learning outcome tracking, and adapt the architecture to other target languages with appropriate base models. The existence of strong vocabulary gains (67%\approx 67\% correct), high user satisfaction, and substantive learner suggestions establishes GenQuest as a reference system for LLM-driven text adventure applications in second language acquisition contexts (Wang et al., 6 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to GenQuest.