Teaching Sets and Curriculum Construction

Updated 21 July 2025

Teaching sets are the minimal collection of examples required to unambiguously convey a concept, forming a basis for optimized teaching methods in both human and machine learning.
Algorithmic strategies, including greedy and I-search methods, systematically construct teaching sets while addressing challenges like conditional dependencies and interposition.
Curriculum construction organizes and sequences teaching sets using data-driven and ontological frameworks to enhance comprehension, transfer, and skill acquisition.

Teaching sets and curriculum construction refer to the systematic selection, organization, and sequencing of instructional content and exemplars that enable efficient learning, robust comprehension, and skill acquisition for both human learners and artificial agents. Across diverse domains—ranging from foundational pedagogy and educational data mining to algorithmic machine teaching—these concepts are grounded in formal theories, empirical studies, and practical frameworks that connect the structure of teaching materials with measurable learning outcomes.

1. Foundational Theories and Definitions

Teaching sets formalize the minimal collection of examples required to unambiguously convey a concept to a learner, particularly in machine learning and learning theory contexts. The teaching dimension (commonly denoted $\operatorname{TS}_{\min}$ ) quantifies this minimal size for a given concept class $\mathcal{C}$ , often parameterized by key properties such as the Vapnik–Chervonenkis (VC) dimension $d$ (Compton et al., 6 May 2025). The core problem is to identify or construct a teaching set $S$ such that, given $S$ , only the target concept remains plausible with respect to $\mathcal{C}$ .

Curriculum construction encompasses the design of both the content (teaching sets) and their presentation sequence, often informed by pedagogical principles, difficulty metrics, and desired transfer/retention outcomes. Curricula may be optimized for humans, artificial agents, or mixed (human-in-the-loop) systems, and can be implemented manually or via automated algorithmic frameworks (Matiisen et al., 2017, Singh et al., 2022, Yengera et al., 2021, Saglietti et al., 2021).

2. Algorithmic Perspectives on Teaching Set Construction

Greedy algorithms have been extensively studied as practical methods for constructing teaching sets. In the canonical greedy approach, at each iteration the algorithm selects a subset $T \subseteq \mathcal{X}$ (the example domain), up to a fixed size $k$ , that most reduces the hypothesis space in the current concept class. This process proceeds recursively until unique identification is achieved (Compton et al., 6 May 2025). Formally, each step chooses $(T^*, b^*) = \operatorname{argmin}_{T,b} |\mathcal{C}|_{T,b}|$ , with

$\mathcal{C}|_{T,b} = \{ c \in \mathcal{C}: \forall x \in T, c(x) = b_x \},$

where $b$ is a labeling of $T$ .

Lower bounds have shown that for small $k$ , the greedy method may not achieve the conjectured optimal teaching dimension $O(d)$ . For example, with $k=1$ , the best case is $O(\log |\mathcal{C}|)$ ; for $k=2$ , bounds such as $O(\log \log |\mathcal{C}|)$ hold but are not strictly tied to $d$ (Compton et al., 6 May 2025). This demonstrates that, to efficiently construct minimal teaching sets for certain concept classes, higher-order interactions—jointly considering larger subsets of examples in each step—become necessary.

This insight implies that curriculum design, when strictly incremental and local (greedy), may miss global structural efficiencies, underscoring the importance of integrative curriculum strategies that bundle multiple concepts or use example sets that encode richer structural information.

3. Theoretical Extensions: Conditional Teaching Size and Curriculum Ordering

Conditional teaching size extends the notion of teaching sets to curriculum settings where concepts are not taught in isolation. Given already learned (or previously taught) concepts $\{c_1, ..., c_n\}$ forming a library $B$ , the conditional teaching size $TS_\ell^f(a|B)$ captures the additional teaching cost for a new concept $a$ when $B$ is available (Garcia-Piqueras et al., 2021). This framework enables reuse of prior knowledge and reflects human and machine learning where earlier-acquired primitives facilitate acquisition of new, more complex concepts.

A critical and perhaps counterintuitive finding is the "interposition phenomenon": prior knowledge in $B$ can sometimes increase the conditional teaching size for new concepts, as previously learned programs or concepts can interfere and require extra disambiguation in teaching sets (even though conditional Kolmogorov complexity $K(a|b)$ is always less than or equal to $K(a)$ ). The paper formalizes this as cases where $TS_\ell^f(c|B) > TS_\ell^f(c)$ . To optimize curricula under these constraints, the I-search algorithm constructs orderings that avoid interposition and minimize overall curriculum teaching cost.

These results highlight the nuanced trade-offs in curriculum design: optimal teaching order may not follow naive dependence or complexity ordering, and effective strategies must balance the facilitation provided by building blocks with the potential for interference.

4. Automated and Data-Driven Curriculum Construction

Recent research has emphasized automated algorithmic curriculum construction, particularly in machine learning and reinforcement learning. Frameworks such as Teacher-Student Curriculum Learning (TSCL) (Matiisen et al., 2017), Data Curriculum for Reinforcement Learning (DCUR) (Seita et al., 2021), and Curriculum Designer (CD) (Singh et al., 2022) dynamically or algorithmically structure the order and content of teaching sets.

TSCL formalizes the problem in a teacher-student framework where the teacher selects tasks based on the observed learning progress of the student, specifically targeting subtasks where the student is making the fastest gains or beginning to forget previously learned material.
DCUR implements explicit data curricula by controlling the window of teacher data available to the student at each training time. Additive and scaled curriculums—parameterized by window width or scaling factors—gradually increase the diversity and difficulty of examples, mirroring human educational scaffolding and supporting stable convergence in offline RL.
The Curriculum Designer algorithm uses inter-class feature similarity matrices to rank and select curricula that maximize transfer and minimize forgetting. Empirical studies demonstrate a strong correlation between curricula optimized for continual learning algorithms and those effective for humans (Singh et al., 2022).

Structural or graph-based approaches, such as knowledge graph construction (Weng et al., 2020) and semantic curriculum ontologies (Christou et al., 6 Jun 2025), provide alternative algorithmic foundations for curriculum design. By representing learning materials, topics, prerequisites, and learner personas as structured entities in a knowledge graph or ontology, these methods support personalized curriculum construction, dynamic adaptation, and cross-resource integration.

5. Human-Centric and Domain-Specific Curriculum Design

In applied educational settings, curriculum construction is informed by domain requirements, cognitive principles, and user diversity.

Human-Centric Software Engineering (HCSE) curricula emphasize integrating soft skills, ethics, HCI/UI considerations, cultural awareness, and requirements engineering alongside technical content (McKenzie et al., 10 Jul 2024). Scaffolded progression aligns teaching themes with the software engineering lifecycle, often with project-based and reflective activities to reinforce human-centered competencies.
Science education research emphasizes research-driven curriculum construction, integrating project-based learning, explicit development of mathematization skills, conceptual integration (e.g., in rotational dynamics), and alignment with teacher and student epistemological beliefs (Guisasola et al., 15 Jul 2024).
The backward design framework (Cooksey et al., 2022) asserts that curriculum construction should begin by defining learning objectives, then designing aligned pre-/post-assessments that inform iterative adjustment based on normalized gain statistics:

$g = \frac{\text{Post}\% - \text{Pre}\%}{100\% - \text{Pre}\%}$

This data-driven approach to curriculum and teaching set evaluation ensures longitudinal efficacy tracking.

Decision-tree models like the Quantum Curriculum Transformation Framework (QCTF) systematize curriculum creation in emerging fields by guiding instructors through topic selection, targeted skill definition (theory, computation, experiment), learning goal articulation (adapting Bloom’s taxonomy), and teaching method selection (scaffolding, inquiry-based, MERs) (Goorney et al., 2023).

6. Pedagogical Structures, Knowledge Graphs, and Ontological Approaches

Knowledge graph–based representations and semantic ontologies offer models for systematically describing curricula, teaching sets, and their interrelations (Weng et al., 2020, Christou et al., 6 Jun 2025). These frameworks:

Modularize curriculum elements into hierarchical graphs—e.g., nodes for School, Teacher, Student, Course, Knowledge Point, Exercise Type—enabling both top-down planning and bottom-up learning analytics.
Support dynamic, individualized learning paths by representing Personas and LearningPaths as first-class entities, allowing the curriculum to be adapted to differing backgrounds and needs.
Use explicit relationships and sequencing (e.g., hasNextLearningStep, coversTopic) to enforce pedagogical principles such as scaffolding and coherent progression.

Validation through competency questions and materialization in open knowledge networks demonstrate that these ontologies enable dense interlinking of modular content, answer complex curriculum-related queries, and support both retrieval and adaptive sequencing for human learners and AI systems.

7. Implications, Challenges, and Future Research

The convergence of formal theory, empirical results, and domain-driven design yields several broad implications:

Simple, incremental (greedy) teaching strategies are fundamentally limited for some concept classes and curriculum goals. More effective curriculum construction often requires explicit modeling of higher-order interactions and integrative example sets (Compton et al., 6 May 2025).
Conditional dependencies and interference effects dictate that optimal teaching orders cannot be inferred solely from concept complexity or prerequisite structure—algorithms must explicitly model interposition and context-dependent teaching size (Garcia-Piqueras et al., 2021).
Automated, data-driven, and ontology-based approaches to curriculum construction promise scalable, adaptive, and individualized learning pathways but require careful alignment with educational objectives, learner modeling, and empirical efficacy validation (Singh et al., 2022, Christou et al., 6 Jun 2025, Weng et al., 2020).
In domain-specific settings, integrating human-centric, ethical, and reflective dimensions into the curriculum scaffolding is essential for producing graduates capable of addressing societal and user needs (McKenzie et al., 10 Jul 2024).
Research-based, assessment-led (e.g., backward design) curriculum construction frameworks provide mechanisms for objective evaluation and continuous improvement (Cooksey et al., 2022).

Challenges remain in resolving theoretical questions related to the optimal teaching dimension, integrating cross-domain knowledge graphs, efficiently searching curriculum orderings, and extending compositional teaching protocols to universal languages and complex machine learning models. Ongoing research is likely to explore further the trade-offs in teaching set construction, curriculum ordering algorithms, and cross-contextual generalization, with broad implications for both AI and human education.