TELeR Taxonomy for LLM Prompt Control
- TELeR Taxonomy is a hierarchical framework that categorizes and controls prompt detail for LLM applications in educational simulations.
- It employs four cumulative levels to balance model flexibility with structured guidance, improving format adherence and instructional alignment.
- Empirical results show that optimal TELeR levels enhance syntactic validity and pedagogical quality, notably at Levels 2–3.
The TELeR taxonomy is a hierarchical framework designed to systematically categorize and control prompt detail for LLM applications, with particular utility for simulation-aligned, teacher-driven question generation in virtual lab educational settings. Serving as a core prompt engineering component, TELeR modulates the granularity of instructional context and output expectations supplied to the LLM, directly influencing both the syntactic validity and pedagogical quality of generated questions. Below, key facets of the TELeR taxonomy and its integration within instructional goal-aligned question generation are presented with technical precision.
1. Structure and Levels of the TELeR Taxonomy
The TELeR taxonomy defines four prompt detail levels, each representing a distinct configuration for controlling the LLM’s generation behavior:
| TELeR Level | Description of Prompt Detail | Pedagogical Effect |
|---|---|---|
| Level 1 | Minimal guidance (bare instruction) | Maximizes model flexibility; minimal structural cues |
| Level 2 | Additional detail (clarifications, minimal cues) | Balances guidance with creative latitude |
| Level 3 | Structured list of considerations | Explicit constraints on knowledge units, relationships |
| Level 4 | Ideal output characteristics specified | Strict adherence to pedagogical and syntactic targets |
Levels are cumulative: higher levels recursively include directives from lower levels and add further constraints or clarifications. For instance, Level 3 prompts specify particular knowledge units or relationships to emphasize, instructing the LLM to focus on specific lab concepts and pedagogical goals, while Level 4 augments format compliance and response clarity through explicit output requirements.
2. Interaction with Instructional Goal and Lab Understanding
TELeR operates synergistically with other framework modules responsible for mapping teacher objectives and simulation context:
- Instructional Goal Understanding parses teacher-Lab dialogue to construct a structured simulation representation , comprising learning objectives, key knowledge units (), and relationships ().
- At higher TELeR levels, the prompt explicitly incorporates subsets of , thereby anchoring generation in relevant pedagogical context and simulation structure.
- For example, at Level 3: The prompt may take the form, “Given with , , and , generate a question highlighting cause-and-effect between and , meeting justification format requirements, and supporting critical thinking as per teacher goals.”
By controlling the volume and specificity of simulation context passed to the LLM, TELeR ensures the generated output is both educationally coherent and structurally compliant.
3. Enhancement of Syntactic Validity and Format Adherence
A critical challenge in LLM-based educational content generation is maintaining output parsability (e.g., valid JSON) and strict format adherence (e.g., blanks for cloze-type questions, recognized answer types). The TELeR taxonomy provides a mechanism to modulate this:
- Evaluation data show JSON load accuracy (parsability) of 76–80% across TELeR levels, with Level 2 and Level 3 prompting achieving the highest rates (~80%).
- Format adherence—measured as the percentage of outputs aligning with intended format—exceeds 90% at moderate prompt specificity (Levels 2–3).
- Inclusion of structured response constraints (e.g., answer must reside in
<blank>field; output must conform to specific nested structure) at higher TELeR levels directly increases syntactic validity.
Thus, by increasing the prompt’s explicitness, TELeR improves the likelihood that LLM outputs are automatically parsable and reliably formatted for downstream consumption, such as automated grading systems or interactive lab platforms.
4. Impact on Pedagogical Question Quality
Beyond format, TELeR modulation substantively impacts the educational quality of generated questions:
- Evaluation (see paper tables “Validity Results for each TELeR Level” and “Quality Results”) demonstrates that increasing TELeR prompt detail (up to Level 3) correlates with improved scores for fluency, specificity, critical thinking, and instructional alignment.
- Format-specific guidance at Levels 2–3 produces questions that more closely adhere to teacher objectives and lab context, directly supporting cognitive demand (as measured by open-ended/relational format scores: gain of 0.29–0.39 points).
- Larger LLMs demonstrate stronger baseline adherence and quality, yet their gains are highest when paired with TELeR-optimized prompts: parsability improves by 37.1%, format adherence by 25.7%, and average quality by 0.8 Likert points.
A plausible implication is that, for high-stakes pedagogical applications, optimal TELeR level selection can balance creativity with procedural rigor.
5. Technical Integration and Mathematical Representations
The TELeR taxonomy’s application is algorithmic:
- Simulation representations are modeled as structured mathematical objects
- Prompts at Level are constructed as
where encodes the incremental directives specific to TELeR Level .
- This disciplined prompt template ensures reproducibility and allows fine-grained control over input length, semantic coverage, and answer structure.
Such explicit mathematical modeling underpins both empirical evaluation and system deployment.
6. Empirical Findings and Recommendations
Empirical results across 19 open-source LLMs and >1,100 generated questions (see cited paper for granular results) indicate:
- JSON parsability plateaus at Level 2 (≈80%), with format adherence near or above 90% at Level 3.
- Quality gains are most pronounced when advancing from minimal (Level 1) to moderately detailed prompts (Level 2–3), stabilizing at higher levels.
- Largest models benefit most from TELeR-based prompting, with adherence and quality metrics substantially improved over naïve or minimally guided inputs.
This suggests targeted TELeR prompt engineering is especially critical when leveraging the generative strengths of frontier LLMs in educational, simulation-aligned content creation.
7. Context, Limitations, and Applicability
TELeR’s taxonomy provides a principled mechanism to calibrate instructional specificity within LLM-driven content generation. However, results indicate that excessive detail (beyond Level 3) may not further improve output validity—highlighting the practical need for adaptive prompt tuning. Additionally, while TELeR improves format adherence and pedagogical value, other system components (instruction goal mapping, lab understanding, question taxonomy integration) remain essential complements.
A plausible implication is that, in scalable educational frameworks, TELeR should be incorporated dynamically, allowing instructional designers or teachers to select between prompt levels according to context, desired flexibility, and downstream requirements.
In summary, TELeR taxonomy serves as a hierarchical, technical standard for controlling prompt detail in LLM-powered question generation, directly impacting output validity, structural adherence, and alignment with instructional goals in virtual lab educational environments. Its integration within broader frameworks substantiates its utility as a “control knob” for prompt engineering in pedagogical AI systems (Knipper et al., 7 Oct 2025).