Coarse-to-Fine Curriculum in Hierarchical Learning

Updated 17 April 2026

Coarse-to-fine curriculum is a hierarchical training framework that starts with broad, simplified tasks and progressively shifts to complex, detailed representations.
It leverages explicit hierarchies, such as label trees and resolution schedules, to transfer knowledge efficiently across learning phases.
Empirical studies demonstrate that this approach improves computational efficiency and model accuracy in domains like vision, audio, and 3D modeling.

A coarse-to-fine curriculum is a structured training paradigm in which learning proceeds through a sequence of phases, beginning with easier, more abstract (“coarse”) tasks or data representations, and then gradually shifting to harder, more specific (“fine”) tasks or higher-resolution representations. Unlike traditional curriculum learning, which typically manipulates the ordering or weighting of individual training examples, a coarse-to-fine curriculum often leverages explicit hierarchies (e.g., label trees, data resolutions, problem decompositions) or curriculum schedules across modalities, architectures, or data granularities. This approach is deployed across domains including classification, vision, audio modeling, structured prediction, few-shot and incremental learning, mesh-based simulation, and 3D generative modeling.

1. Core Principles and Formal Definitions

The central tenet of coarse-to-fine curriculum learning is to scaffold the training process on a sequence of tasks, targets, or data with increasing granularity or difficulty. The curriculum is typically defined by:

Hierarchy of Tasks or Representations: Denoted as $\mathcal{T}_1 \rightarrow \mathcal{T}_2 \rightarrow \ldots \rightarrow \mathcal{T}_M$ , where $\mathcal{T}_1$ is the coarsest (easiest), and $\mathcal{T}_M$ the finest (most complex or detailed) (Stretcu et al., 2021, Shaheen et al., 2024).
Curriculum Schedule: Specifies when and how progressions (switching or blending between tasks/data) occur, such as epochs per phase, or adaptive schedules linked to convergence metrics, entropy, or annealed coefficients (Ren et al., 2018, Feng et al., 2024, Xiang et al., 10 Mar 2026).
Transfer Mechanisms: Knowledge or parameters are transferred from coarser tasks to finer ones, often via parameter sharing, initialization, or output re-encoding (Stretcu et al., 2021, Ren et al., 2018).

Typical mathematical formalisms include sequences of objectives (e.g., marginalized cross-entropy losses across partitioned label spaces), multi-stage architectures with progressive masking, or per-phase regularizers and hyperparameters. An archetypal form for hierarchical supervised classification is:

$L_\ell(\theta) = -\sum_{i} \log \sum_{k \in C_\ell(y_i)} p_\theta(k \mid x_i)$

for curriculum level $\ell$ with clusters $C_\ell$ at that level (Stretcu et al., 2021, Shaheen et al., 2024).

2. Methodological Variants

Hierarchical Label/Task Curricula

Label-based Hierarchies: Constructing a tree over the class labels via affinity or confusion metrics, resulting in a curriculum from coarse superclasses to fine-grained classes (e.g., animal vs. object $\to$ cat/dog/truck) (Stretcu et al., 2021, Shaheen et al., 2024). Both continuous and staged model transfer strategies are used (Stretcu et al., 2021).
Hierarchy in Structured Prediction: Propagating coarse predictions (e.g., semantic segmentation masks, bounding boxes, or class vectors) as additional inputs to fine predictors, often encoded and concatenated to the original input to facilitate joint or progressive training (Ren et al., 2018).

Multi-Resolution and Temporal Curricula

Resolution Scheduling: In domains such as audio spectrogram modeling and mesh-based simulation, training is phased by data resolution—initially employing heavily compressed, coarse data, followed by increased resolution in later phases (Feng et al., 2024, Garnier et al., 16 Sep 2025).
Time-step or Masking Curricula: Diffusion and masked modeling tasks leverage curriculum schedules over time steps or masking patterns, beginning with coarser guidance (e.g., semantic or high-noise, global cues) and shifting to fine (pixel-level or low-noise, local cues) (Yi et al., 2024, Xiang et al., 10 Mar 2026).

Coarse-to-Fine in Generative and Selection Processes

Dataset Distillation: Coarse-to-fine selection processes combine synthetic and real data, where curricular selection of real examples is based on misclassification or "blind spots" of the current model, refining data inclusion from broad errors (coarse) to more subtle deficiencies (fine) (Chen et al., 24 Mar 2025).

Few-Shot, Incremental, and Hierarchical Representation Learning

Incremental Learning (C2FSCIL): Initial learning occurs on coarse labels with contrastive objectives, followed by incremental sessions adding fine classes with small support sets. The embedding is frozen after the coarse stage, and only classifier weights for fine classes are adapted (Xiang et al., 2021, Dai et al., 23 Sep 2025).

3. Illustrative Algorithmic Schematics

The implementation of coarse-to-fine curricula can be summarized as follows (Editor’s term: C2F Algorithmic Template):

Hierarchy Construction: Compute pairwise class or data similarities, cluster into $M$ nested partitions (coarse to fine).
Curriculum Loop:
- For each curriculum stage $m=1\ldots M$ $m = 1 \dots M$ :
  1. Train (or fine-tune) a model/task for stage $m$ ; optionally, transfer weights from stage $\mathcal{T}_1$ 0.
  2. (If multi-path) Retain best checkpoints for downstream fine-tuning.
- Optionally ensemble multiple fine-stage models to combine decision boundaries (Shaheen et al., 2024).
Switching/Blending: Use time-based, validation-based, or entropy-based schedules to transition between stages (e.g., transition when validation accuracy saturates, or after fixed epochs).

Tables for specific domains are presented below.

Domain	Coarse Level	Fine Level
Audio Spectrogram Transformers	Downsampled (e.g., 2 $\mathcal{T}_1$ 1)	Full-resolution
Chart Classification	2 broad chart groups	15 chart types
Self-supervised Vision (C2FMAE)	Semantic mask / high masking ratio	RGB / pixel-level
3D SDF Learning (DeepSDF)	High-tolerance, smoothness focus	Tight-tolerance, sharp
Incremental Few-shot Learning	Coarse-class identification	Fine-class introduction
Mesh Simulations	Small number of mesh nodes	Full-resolution mesh

4. Empirical Impact and Quantitative Gains

Multiple works substantiate the efficacy of coarse-to-fine curricula:

Audio Spectrogram Transformers: Two-phase (2 $\mathcal{T}_1$ 21) curriculum achieves 70–75% fewer FLOPs and 30–40% wall-clock reduction, small accuracy improvements (Feng et al., 2024).
Chart Classification: C2F-CHART achieves 0.8 pp improvement in F1-score over Swin-Chart with a coarse-to-fine schedule plus ensembling (Shaheen et al., 2024).
3D SDF Reconstruction: Curriculum DeepSDF reduces Chamfer Distance by 32.3% compared to baseline; both surface-tolerance and sample-difficulty scheduling critical (Duan et al., 2020).
Dense Retrieval: CL-DRD increases MRR@10 and nDCG@10 over baselines, demonstrating benefit on MS MARCO and TREC-DL benchmarks (Zeng et al., 2022).
Mesh-based Simulations: Curriculum cuts wall-clock by up to 50%, improves generalization error, and overcomes loss plateaus (Garnier et al., 16 Sep 2025).
Incremental Few-shot: Knowe and HypKnowe optimize the stability-plasticity trade-off, maximizing average accuracy and minimizing forgetting (Xiang et al., 2021, Dai et al., 23 Sep 2025).
Generative 3D Modeling: Time-step curriculum in diffusion models yields sharper, multi-view consistent geometry and lower error rates (Yi et al., 2024).

5. Limitations, Open Problems, and Best Practices

Limitations and subtleties include:

Curriculum Hyperparameters: Requires schedule settings for epochs per phase, switching criteria, and curriculum depths. Manual heuristics or task-specific tuning are often used (Stretcu et al., 2021, Garnier et al., 16 Sep 2025).
Hierarchy Construction: Automatic clusterings may not always yield optimal partitions; integrating human-defined or task-informed taxonomies remains an open issue (Shaheen et al., 2024).
Transferability: Some curricula require models with flexible positional embeddings, decoupled encoder-decoder architectures, or specific transfer mechanisms (Feng et al., 2024, Stretcu et al., 2021).
Applicability to Arbitrary Tasks: Output-space coarse-to-fine is most natural in classification or sequence tasks with explicit hierarchies; extension to regression or more structured prediction is nontrivial (Stretcu et al., 2021).
Over-compression or Oversmoothing: Overly aggressive coarse stages (e.g., extreme temporal compression or excessive graph sub-sampling) may result in unrecoverable information loss (Feng et al., 2024, Garnier et al., 16 Sep 2025).

Empirically supported best practices for deploying a coarse-to-fine curriculum:

Begin with a pretraining or curriculum phase at coarsest granularity or lowest resolution for stable, efficient parameter initialization (Feng et al., 2024, Garnier et al., 16 Sep 2025).
Employ a small number (typically 2–3) of curriculum phases; diminishing returns are observed beyond this (Feng et al., 2024, Stretcu et al., 2021).
Reset or anneal optimizer schedules on each stage transition (Garnier et al., 16 Sep 2025).
Align the decoder or output head structure with the curriculum’s data hierarchy (e.g., cascaded decoders for hierarchical masked autoencoding) (Xiang et al., 10 Mar 2026).
Validate with ablations, comparing orderings and excluding curriculum components to isolate their additive value (Ren et al., 2018, Shaheen et al., 2024).

6. Domain-Specific Instantiations

Audio and Temporal Modeling: Multi-phase spectrogram curricula optimize training cost and performance by manipulating input time-axis resolution and patchification strategy (Feng et al., 2024).

Visual Recognition and Multi-Granular Representation Learning: Hierarchical curricula enable robust learning of both global and fine object structure; progressive masking and cascaded decoding improve self-supervised representation robustness (Ren et al., 2018, Xiang et al., 10 Mar 2026).

3D Shape and Diffusion-Based Generation: Curriculum learning on surface tolerance, sample weighting, or diffusion time steps improves geometric fidelity and generalization in both signed distance field regression and 3D generation from single images (Duan et al., 2020, Yi et al., 2024).

Incremental and Few-Shot Learning: Sequential freezing and fine-class addition (Knowe, HypKnowe) achieve stability against catastrophic forgetting while still allowing plasticity for new classes, demonstrated on C2FSCIL and related benchmarks (Xiang et al., 2021, Dai et al., 23 Sep 2025).

Dense Retrieval and Structured Distillation: Scheduling the composition or granularity of training pairs (from coarse groupings to fine top-k ordering) systematically controls the difficulty and sharpens rank modeling (Zeng et al., 2022).

7. Outlook and Generalization

The coarse-to-fine curriculum paradigm unifies a wide spectrum of advances grounded in hierarchical, progressive, or staged learning. It provides a principled approach for decomposing complex learning problems and ensures that learning is anchored first in broadly discriminative, robust representations, before specializing to task-relevant detail. This design has consistently demonstrated quantitative improvements in both efficiency and generalization across classification, dense retrieval, self-supervised modeling, generative geometry, and simulation surrogacy. Future work spans hierarchical extension to deeper or adaptive label trees, tighter integration with meta- and reinforcement learning for automatic curriculum synthesis, and broader application to arbitrary structured prediction tasks (Stretcu et al., 2021, Shaheen et al., 2024, Garnier et al., 16 Sep 2025).