Mathematical Explanation for Faster Convergence of Attention-Based Models vs. RNNs
Establish, through formal mathematical analysis, why transformer attention mechanisms often converge faster than recurrent neural networks (RNNs) when trained on natural language processing tasks.
References
It is clear that the attention mechanism has shown its effictiveness. It is at the heart of LLMs, like GPT, LLAMA, DeepSeek,... We have not analyze from a mathematical perspective why it converges faster than RNN.
                — The algebra and the geometry aspect of Deep learning
                
                (2510.18862 - Aristide, 21 Oct 2025) in Section 7. Training Neural network for Natural language processing tasks