Coarse-to-Fine Initialization

Updated 7 February 2026

Coarse-to-fine initialization is a hierarchical strategy that begins with a coarse approximation to capture global structure and then incrementally refines solutions across various applications.
This approach improves convergence speed and accuracy by transferring parameters and representations from coarser to finer levels, reducing sensitivity to local minima.
It is widely applied in deep learning, probabilistic inference, and robotics, making it a key method for effective multilevel optimization and knowledge transfer.

Coarse-to-fine initialization is a principled strategy for introducing global structure and hierarchical refinement into learning, inference, and optimization algorithms. It proceeds by first producing an approximate or partial solution at a coarse level—either in semantic, spatial, temporal, or representational granularity—and then incrementally refining this solution at finer levels, each stage benefiting from the initialization, signal, or structural priors provided by its predecessor. This methodology is pervasive across domains: from deep learning architectures (e.g., segmentation, classification, meta-embedding, few-shot learning) to probabilistic inference, optimization, and dynamical system simulation. The approach facilitates faster convergence, reduces sensitivity to local minima, and enables effective parameter and representation transfer across task hierarchies.

1. Hierarchical Parameter and Representation Initialization

The transfer of learned parameters or representations from coarse to fine task hierarchies is a central theme. In continual semantic segmentation, the CCDA approach introduces a coarse-to-fine weight initialization rule: when a coarse class is split into finer subclasses, the new classifier weights for the fine classes are directly copied from their parent’s weights, and the bias is set so that the prior probability mass is evenly split among children. This preserves semantic continuity and balances softmax outputs, thus stabilizing training and enabling effective knowledge distillation (Shenaj et al., 2022). Similarly, in meta-embedding for recommendation, a coarse codebook is established first, after which a sparse fine codebook is bootstrapped from the coarse one via SparsePCA, with weight bridging and soft-thresholding enforcing both hierarchical semantic structure and memory efficiency (Wang et al., 21 Jan 2025). In hyperbolic few-shot class-incremental learning, fine-class classifier weights are initialized from the (L2-normalized and frozen) prototypes of their parent coarse classes, encoding tree-structured priors into the feature space for superior transfer and few-shot generalization (Dai et al., 23 Sep 2025).

2. Architectural Realizations: Stacked Modules and Multi-Resolution Pipelines

Coarse-to-fine initialization frequently manifests as stacked module design or multi-resolution pipelines. In fine-grained segmentation and parsing, stacked fully-connected modules are appended, each operating at increasingly fine semantic granularity. The output of each coarse module directly initializes the input to the next, while skip connections from intermediate encoder layers re-inject localized detail, resulting in improved mean IoU and boundary delineation across a variety of datasets (Hu et al., 2018). Analogous strategies occur in volumetric registration, where affine transformation parameters are estimated at successive spatial scales using trilinear downsampling and progressive transformer blocks, each stage leveraging the output of coarser-resolution alignments to robustly initialize finer-scale registration (Mok et al., 2022). Multi-resolution approaches of this kind ensure that early stages recover large-scale structure and later stages correct fine details, yielding both speed and accuracy.

3. Coarse-to-Fine in Learning, Curriculum, and Semi/Unsupervised Regimes

In supervised and semi-supervised learning, coarse-to-fine initialization is formalized via curriculum. In Coarse-to-Fine Curriculum Learning for classification, label hierarchies are automatically constructed by clustering class-embedding vectors, after which the model is first trained to predict at the coarsest level (e.g., animal vs. object) and then incrementally at finer levels (e.g., species or object subclass), each stage carrying forward parameters from the previous—either via continuous marginalized training loss or explicit weight transfer for each stage's classifier head (Stretcu et al., 2021). A universal benefit is observed: such curricula consistently induce higher accuracy, faster convergence, and better sample efficiency, particularly when the number of classes is large or labeled data is scarce. In weakly supervised segmentation, an unsupervised or pseudo-supervised coarse foreground mask initialization is recursively refined through graph-cut denoising and FCN training rounds, with each iteration's output serving as initialization for the next; this bootstrapping approach closes the gap to methods with stronger supervision (Jing et al., 2018).

4. Coarse-to-Fine Initialization in Probabilistic Inference and Optimization

Probabilistic inference for structured models or dynamical systems substantially benefits from hierarchical initialization. In probabilistic programs, the target state space is coarsened recursively via user-defined functions, yielding a ladder of intermediate distributions on abstracted domains. Sequential Monte Carlo (SMC) samplers operate over these levels, initializing at the coarse scale where global modes are accessible and refining particles by conditional importance sampling and particle rejuvenation, preserving marginal correctness at the finest level. This construction both reduces variance in weights and mitigates particle impoverishment, exploitably leveraging model structure for scalable inference (Stuhlmüller et al., 2015). Analogous principles govern equation-free computation for multiscale dynamics: a wrapper initializes the state on a slow manifold with desired coarse observables by short bursts of the fine-scale simulator; Newton iteration in the coarse variables suffices for accurate manifold initialization, circumventing the need to derive explicit slow equations (Vandekerckhove et al., 2010). In trajectory optimization, coarse keyframe-constrained plans are computed via graph search and serve as initialization for fine-scale adjoint-based continuous refinement, leading to rapid and high-fidelity satisfaction of complex spatio-temporal constraints (Han et al., 2022).

In cross-modal fusion pipelines, coarse-to-fine initialization establishes robust alignment across sensor modalities. LiDAR-VGGT employs a two-stage coarse-to-fine pipeline: a global Sim(3) fit between vision foundation model poses and LiDAR-inertial odometry provides a coarse metric embedding via Umeyama alignment, PCA-based degeneracy analysis, and RANSAC scale consensus. This coarse initialization anchors the vision-based map in the real-world scale and orientation, yielding a strong starting point for subsequent fine ICP-style fusion—critically shrinking the search space, stabilizing the joint optimization, and preventing the downstream process from diverging due to scale mismatch (Wang et al., 3 Nov 2025). In camera-based vehicle localization using HD maps, a coarse GPS-based pose serves as the initial hypothesis, refined by fine-scale grid search and semantic photometric alignment, seeding accurate pose tracking and graph optimization for robust localization in structural scenes (Guo et al., 2021).

6. Algorithmic Templates and Pseudocode

Methodologies implementing coarse-to-fine initialization typically comprise:

Construction of a (semantic, spatial, or representational) hierarchy: class splits, spatial pyramids, label trees, or abstraction ladders.
Transfer of parameters, weights, or prototypes from coarse to fine entities, optionally including normalization or bias adjustment (see CCDA, HypKnowe).
Sequential or recursive refinement: each stage operates using and improving the initialization from its predecessor.
Specialized initialization rules, as in the copy-and-split for classifier expansion (Shenaj et al., 2022), or weight-bridging with SparsePCA for embeddings (Wang et al., 21 Jan 2025).
Pseudocode often embodies (a) a loop on granularity/stage, (b) weight/parameter transfer, (c) execution of optimization/refinement subroutines initialized by the previous stage.

A schematic for classifier initialization is given by:

for parent_class in splitting_classes:
    for child_class in split_children[parent_class]:
        classifier_weights[child_class] = classifier_weights[parent_class]
        classifier_biases[child_class] = classifier_biases[parent_class] - log(len(split_children[parent_class]))
for preserved_class in non_splitting_classes:
    classifier_weights[preserved_class] = classifier_weights[preserved_class]
    classifier_biases[preserved_class] = classifier_biases[preserved_class]

7. Empirical Impact and Considerations

Across all surveyed domains, coarse-to-fine initialization delivers faster convergence, higher accuracy, more reliable optimization, and reduced overfitting or instability—especially when transitioning across hierarchical levels, introducing new fine-grained classes, or solving problems with high-dimensional or multimodal solution spaces. Disabling key steps (e.g., the unbiased bias shift in CCDA or hierarchical weight initialization in HypKnowe) consistently degrades final performance (Shenaj et al., 2022, Dai et al., 23 Sep 2025). The method is robust to parameter choices and adapts readily to a variety of network architectures and problem settings, but does assume that coherent coarse-to-fine mappings (e.g., strict tree hierarchies, invertible abstractions) can be constructed for the given application. Empirical ablations confirm that success hinges on both explicit initialization and, in learning scenarios, preserving and exploiting the hierarchical priors across stages.

In summary, coarse-to-fine initialization serves as a unifying paradigm for multilevel transfer, refinement, and alignment across deep learning, probabilistic inference, and control, leveraging hierarchical structure for robustness, efficiency, and enhanced solution quality (Shenaj et al., 2022, Wang et al., 21 Jan 2025, Dai et al., 23 Sep 2025, Stretcu et al., 2021, Jing et al., 2018, Hu et al., 2018, Wang et al., 3 Nov 2025, Guo et al., 2021, Mok et al., 2022, Stuhlmüller et al., 2015, Vandekerckhove et al., 2010, Han et al., 2022).