Convergence Speed of LSTM vs. Simple RNN

Investigate and determine, via a rigorous mathematical analysis of backpropagation dynamics, whether Long Short-Term Memory (LSTM) networks converge faster than simple recurrent neural networks (RNNs) under standard training protocols.

Background

The authors review RNNs and the vanishing/exploding gradient problem, noting that LSTMs introduce gating and a cell state that appear to preserve gradients across long time intervals, thereby mitigating these issues.

Despite this intuition, they explicitly state that they have not provided a mathematical analysis demonstrating faster convergence of LSTMs compared to simple RNNs, leaving a formal comparison of convergence speeds as an unresolved question.

References

It seems that LSTM can preserve gradients across long time intervals, which mitigates the vanishing gradient problem in standard RNNs. We have not analyzed the expression of backpropagation from a mathematical perspective to show that they converge faster than simple RNN.

— The algebra and the geometry aspect of Deep learning (2510.18862 - Aristide, 21 Oct 2025) in Section 7. Recurrent Neural Network (Long Short-Term Memory)

Convergence Speed of LSTM vs. Simple RNN

Background

References

Related Problems