Multi-step recovery and convergence for exact Muon iterates
Establish that, in the linear associative memory model with Gaussian embeddings and power-law item frequencies p_i ∝ i^{-α} (α > 1), the exact Muon iterates defined by W_{t+1} = W_t + η h_{λ_t}(G_t), with G_t = -∇_W L(W_t; 𝔅_t) and h_{λ}(z) = z / √(z^2 + λ^2), achieve the same multi-step recovery and convergence rates proved under the thresholded-gradient approximation. Concretely, for appropriate schedules η ≍ √d and λ_t ≍ ˜Θ(d_{t+1}^{-α} √d), prove that after t steps all items of ranks up to d_t = ˜Θ(min{ d^{2 - (1 - 1/(2α))^t}, B^{1/α} }) are recovered with high probability and that the loss satisfies L(W_t) ≤ ˜O(d_t^{1-α}).
References
The recovery and convergence rates of Theorem~\ref{thm:multi} also hold for the exact Muon iterates $\bW_{t+1} = \bW_t + \eta h_{\lam_t}(\bG_t)$.