Mechanisms that cause thicket formation during pretraining

Explain the mechanisms by which pretraining produces a local "thicket" of task-improving solutions—i.e., high density and diversity of beneficial parameter perturbations—around pretrained weights in large models; ascertain whether exposure to diverse task distributions during pretraining is the critical factor and identify which aspects of the pretraining objective or learning dynamics create this phenomenon.

Background

The study documents that solution density and diversity around pretrained weights increase with model scale, enabling simple random perturbation selection and ensembling to be effective for post-training. A toy 1D-signal experiment suggests that pretraining on diverse functions can induce a thicket-like neighborhood.

Despite these observations, the authors acknowledge that they do not yet understand the precise causal mechanisms in LLMs. It remains unclear whether diversity in pretraining tasks is essential or whether other characteristics of objectives and learning dynamics drive thicket formation.

References

Our results characterize properties of the pretrained landscape, but do not fully explain the mechanisms by which these properties arise. Is this also the critical factor in developing thickets in LLMs and other large models? What exactly is it about the pretraining objective, or learning dynamics, that creates thickets?

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights  (2603.12228 - Gan et al., 12 Mar 2026) in Limitations, paragraph "Exactly When and Why Does Pretraining Enter the Thicket Regime?"