Scalability of reported findings to larger language models
Determine whether the empirical findings reported for 8–14 billion parameter models—specifically, that conventional instruction-tuning degrades in-context steerability and distributional alignment while Spectrum Tuning improves steerability, output coverage, and alignment—extend to larger language models exceeding 14 billion parameters by conducting systematic, cross-family experiments to empirically verify scaling behavior.
References
We have no reason to believe that our findings will not scale to larger model sizes, but this remains to be empirically verified.
— Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
(2510.06084 - Sorensen et al., 7 Oct 2025) in Limitations, Section “Experiments performed only on ≤14B parameter models”