Scaling behavior of the Free Transformer at larger sizes
Investigate the behavior of the Free Transformer when scaled to larger parameter counts and trained on substantially larger datasets, assessing how performance and training dynamics change with scale.
References
Finally, the behavior in larger scales, both in parameter count and dataset size, remains to be investigated.
— The Free Transformer
(2510.17558 - Fleuret, 20 Oct 2025) in Section 6 (Conclusion)