Scaling of intermediate-temperature sampling results to larger transformers and ambitious protein-structure tasks

Determine how the empirical findings obtained by sampling transformer parameter spaces at intermediate temperatures in small one- and four-block transformers trained on synthetic protein sequences scale to larger transformer architectures and to more ambitious tasks, such as predicting the structures of all known proteins.

Background

The paper studies transformers trained on synthetic protein sequences using a statistical mechanics framework, sampling parameters across temperatures via Langevin dynamics. It finds that transformers exhibit a broad intermediate-temperature regime with superior generalization, contrasted with a sharper transition in comparable feedforward networks.

All experiments are performed on relatively small transformers (one- and four-block variants) and on a synthetic dataset derived from a single protein’s contact map. The authors explicitly note that they cannot yet assess whether these results extend to larger models and to more ambitious, real-world tasks, motivating the open scaling question.

References

Of course, we cannot yet determine how our results scale for larger transformers and more ambitious tasks, such as learning the structure of all known proteins.

Sampling at intermediate temperatures is optimal for training large language models in protein structure prediction  (2603.29529 - Ghiringhelli et al., 31 Mar 2026) in Discussion and Conclusions