Dice Question Streamline Icon: https://streamlinehq.com

Optimal Newton–Schulz polynomial coefficients for AOL-preconditioned Turbo-Muon

Develop optimal iteration-dependent polynomial coefficients for the quintic Newton–Schulz iteration when preceded by Almost Orthogonal Layer (AOL) preconditioning in Turbo-Muon, tailored to a fixed iteration budget, and assess whether these coefficients improve polar approximation performance relative to coefficients designed for the non-preconditioned case.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper compares coefficient sets used in Muon and Muon+ (including Polar Express) and observes that coefficients optimized for the non-preconditioned case can degrade performance when combined with AOL preconditioning. They hypothesize that assumptions about minimum singular values may be incompatible with AOL’s rescaling.

As a result, the authors explicitly leave the task of computing optimal Newton–Schulz coefficients tailored to Turbo-Muon’s AOL preconditioning as future work, indicating a concrete unresolved problem focused on algorithm-specific parameter optimization.

References

Therefore, we leave the computation of optimal coefficients for Turbo-Muon as potential future work.

Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning (2512.04632 - Boissin et al., 4 Dec 2025) in Appendix, Section “About the tuning of Newton-Schulz coefficients”, subsection “Ablation”