Edge-of-chaos initialization via vanishing angle Lyapunov exponent (λa = 0)
Determine whether setting the initialization hyperparameters of the deep transformer architecture defined in Section 2 (single-head self-attention followed by a tokenwise multilayer perceptron with residual connections and layer normalization) so that the angle Lyapunov exponent λa equals zero at the collapsed fixed point of the token-geometry update map constitutes a good initialization for training, in the sense of enabling stable, non-collapsing forward signal propagation in deep randomly initialized transformers.
References
Since neither property seems conducive to stable training, a natural conjecture for a good initialization is at the edge of chaos where λa = 0.
                — Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
                
                (2403.02579 - Cowsik et al., 5 Mar 2024) in Subsection 3.3 (Fixed Points of the Update Map)