Mechanisms that cause thicket formation during pretraining
Explain the mechanisms by which pretraining produces a local "thicket" of task-improving solutions—i.e., high density and diversity of beneficial parameter perturbations—around pretrained weights in large models; ascertain whether exposure to diverse task distributions during pretraining is the critical factor and identify which aspects of the pretraining objective or learning dynamics create this phenomenon.
References
Our results characterize properties of the pretrained landscape, but do not fully explain the mechanisms by which these properties arise. Is this also the critical factor in developing thickets in LLMs and other large models? What exactly is it about the pretraining objective, or learning dynamics, that creates thickets?
— Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
(2603.12228 - Gan et al., 12 Mar 2026) in Limitations, paragraph "Exactly When and Why Does Pretraining Enter the Thicket Regime?"