Positional Curricula for Deep Learning

Updated 18 November 2025

Positional curricula are learning strategies that order training instances by explicit difficulty metrics, enhancing optimization dynamics and calibration.
They leverage task-aligned and inner-state metrics to segment data, impacting convergence speed and generalization performance across varied model capacities.
Implementation choices—such as forward versus reverse ordering and group-based scheduling—tailor curriculum design to task complexity and model strength.

A positional curriculum is a curriculum learning strategy in which the explicit order or segmentation of training instances—according to prescribed difficulty metrics or episode positions—governs the learning process of a model. This strategy draws from both theoretical and empirical frameworks, with implementations spanning LLMs, deep neural networks, and reinforcement learning (RL) driven by expert demonstrations. The central principle is that the positional arrangement—whether easy-to-hard or hard-to-easy, and defined by specific scalar metrics or trajectory indices—can meaningfully shape optimization dynamics, convergence speed, generalization, and calibration properties of the learning system.

1. Formal Definitions and Metric Dimensions

Positional curricula decompose sample difficulty into two complementary metric families: problem-side (“task-aligned”) and model-side (“inner-state”). Task-aligned metrics assess the intrinsic complexity of problems, while inner-state metrics evaluate model responses and uncertainty.

Task-Aligned Metrics:

Reasoning Steps (RS): Number of logic steps, estimated automatically.
Symbol Complexity (SC): Human-calibrated notational richness, scale 1–5.
Comprehension Difficulty (CD): Human-calibrated prompt ambiguity/context dependence, scale 1–5.
Empirical Accuracy (ACC@K): For $K$ model samples, $ACC = (1/K) \sum_{s=1}^K z_s$ where $z_s \in \{0,1\}$ is sample correctness.

Inner-State Metrics:

Sequence-Level Perplexity (SLP): $SLP = \exp\left(-\frac{1}{T}\sum_{t=1}^T \ell_t\right)$ where $\ell_t$ is log-probability.
Token-Level Perplexity (TLP): Averaged entropy over predicted distributions per token.
Logit Gap (LG): Mean log-prob difference between top two candidates.
Sequence/Token-Level Entropy (SLE/TLE): Indices of predictive uncertainty.
Decision Variance (VACC): Variance of ACC across samples, $VACC = \hat p (1-\hat p)$ .

Given any scalar metric $m(x)$ , dataset $D$ can be permuted:

Forward Curriculum (FCL): Sort $x$ in ascending $m(x)$ .
Reverse Curriculum (RCL): Sort $x$ in descending $m(x)$ .

The formal definition enables high-precision ordering and quantile segmentation (e.g. Low/Med/High tiers) of training data for batch presentation (Jia et al., 21 Oct 2025).

2. Construction Algorithms and Implementation Patterns

The construction of positional curricula follows deterministic reordering based on explicit metric scores. The prototypical workflow for offline fine-tuning on LLMs is:

Inputs: D = dataset of N examples, m() = difficulty metric, direction ∈ {F,R}, B = batch size
Scores = [m(x) for x in D]
sort_idx = argsort(Scores) if direction == F else argsort(Scores)[::-1]
Permuted_D = D[sort_idx]

For quantile ("group-based") variants:

Partition Scores into $Q$ quantiles.
Shuffle within quantiles.
Concatenate batches in Low→Med→High (FCL) or High→Med→Low (RCL) order.

In RL tasks, positional curricula are implemented via initial state sampling from expert trajectories. Specifically, ACED (Automatic Curricula via Expert Demonstrations) divides each trajectory into $C_{\max}$ sections, samples start states from increasingly earlier sections as the agent improves, and adapts curriculum stages based on episodic return thresholds (Dai et al., 2021).

3. Empirical Findings Across Domains

Experiments reveal that no single positional curriculum universally dominates. Effectiveness is nuanced and depends on the interaction between model capacity, task complexity, and the direction of ordering.

Key Observations in LLM Fine-Tuning (Jia et al., 21 Oct 2025):

Forward ordering tends to benefit strong models and simple tasks (e.g. up to +4–6% for ASDiv/GSM8K under RS/SC).
Reverse ordering is preferable for weaker models or harder OOD tasks (e.g. MATH dataset; +2.3% with RCL under RS).
SLP: FCL preferred in 11/15 cases; TLP gains are less stable and sometimes favor RCL for hard tasks.
LG: RCL (high-confidence examples first) wins in 8/9 settings (up to +3.5% gain).
VACC: RCL generally accelerates adaptation on easy reasoning (+5.9% on MMQA).
Group-based curricula with tiered slices smooth performance, especially on small models.

RL via Expert Demonstrations (Dai et al., 2021):

ACED positional curriculum with moderate $C_{\max}$ (3–8) is crucial; too few/no curriculum fails outright.
Block-stacking with 20 demos leads to novel, more efficient solutions than in expert data.
Standard Backplay or reverse-generation approaches do not generalize in continuous control.

Analytical Theory (Saglietti et al., 2021):

Online (single-pass) curriculum reliably speeds convergence by a factor of $1.1$–$1.5$ over random/shuffled order.
Asymptotic gains in batch settings require inter-phase coupling via Gaussian priors: up to 10–20% drop in generalization error $\varepsilon_g$ for sparse tasks.
Benefit is maximal when easy slice reveals support, but not full solution (critical $\alpha_1, \rho$ regime).

4. Guidelines for Positional Curriculum Design

Empirical results yield actionable principles for selecting and deploying positional curricula.

Match ordering to model/task regime: Strong model + simple task favors forward ordering; weaker model or hard task recommends reverse ordering, particularly on confidence/variability metrics.
Metric selection: Task-aligned metrics (RS, SC, CD, ACC) steer representation and generalization; inner-state metrics (SLP, TLP, LG, SLE, TLE, VACC) modulate optimization rate and calibration.
Tier/group-based scheduling: Partitioning into quantiles reduces sensitivity to ordering noise and enhances stability.
Multi-metric fusion: Interleaving batches sorted by different metrics (e.g. ACC↑, LG↓) can combine strengths.
Adaptive schedules: Online re-estimation of difficulty scores may further refine positional assignment, suggesting an open area for future research.

5. Theoretical Insights and Critical Regimes

Analytical studies have identified explicit regimes where positional curricula are most effective.

For sparse teacher-student problems, easy-first ordering enables support discovery; asymptotic error is minimized when coupling is introduced between learning phases (Saglietti et al., 2021).
No benefit is observed for non-sparse tasks or when either slice is overwhelmingly abundant.
Anti-curriculum (hard-to-easy) can outperform easy-first in select online regimes if initial weights are large.
Analytical phase diagrams delineate where curriculum placement yields acceleration or generalization gains versus negligible effects.

6. Limitations, Extensions, and Open Questions

No universal solution: The optimal positional curriculum depends on metric selection, model scale, task structure, and desired effect.
Metric fusion and hybrid schedules (e.g., sequential use of multiple orderings) remain largely unexplored.
Proposed new metrics for ordering include chain-of-thought length, per-example gradient norm, and symbolic solver complexity counts.
Extensions to adaptive online curricula and mixing of reset-state distributions from multiple expert sources may further enhance sample efficiency.

Positional curricula offer a flexible, metric-driven framework for controlling the training trajectory in complex learning systems. By suitably tailoring the order and grouping of samples or episodes according to carefully defined difficulty dimensions, researchers can systematically influence both learning dynamics and generalization, with proven efficacy in domains ranging from mathematical reasoning in LLMs to sparse-reward robotics (Jia et al., 21 Oct 2025, Dai et al., 2021, Saglietti et al., 2021).