On the Dynamics of Learning Time-Aware Behavior with Recurrent Neural Networks (2306.07125v1)
Abstract: Recurrent Neural Networks (RNNs) have shown great success in modeling time-dependent patterns, but there is limited research on their learned representations of latent temporal features and the emergence of these representations during training. To address this gap, we use timed automata (TA) to introduce a family of supervised learning tasks modeling behavior dependent on hidden temporal variables whose complexity is directly controllable. Building upon past studies from the perspective of dynamical systems, we train RNNs to emulate temporal flipflops, a new collection of TA that emphasizes the need for time-awareness over long-term memory. We find that these RNNs learn in phases: they quickly perfect any time-independent behavior, but they initially struggle to discover the hidden time-dependent features. In the case of periodic "time-of-day" aware automata, we show that the RNNs learn to switch between periodic orbits that encode time modulo the period of the transition rules. We subsequently apply fixed point stability analysis to monitor changes in the RNN dynamics during training, and we observe that the learning phases are separated by a bifurcation from which the periodic behavior emerges. In this way, we demonstrate how dynamical systems theory can provide insights into not only the learned representations of these models, but also the dynamics of the learning process itself. We argue that this style of analysis may provide insights into the training pathologies of recurrent architectures in contexts outside of time-awareness.
- Jeffrey L Elman. Finding Structure in Time. Cognitive Science, 14(2):179–211, 1990. doi: https://doi.org/10.1207/s15516709cog1402_1. URL https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1.
- Long short-term memory. Neural Computation, 1997.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, 2014.
- Understanding the computation of time using neural network models. Proceedings of the National Academy of Sciences of the United States of America, 117(19):10530–10540, 2020. ISSN 10916490. doi: 10.1073/pnas.1921609117.
- A theory of timed automata. Theoretical Computer Science, 126(2):183–235, 1994. ISSN 03043975. doi: 10.1016/0304-3975(94)90010-8.
- Jordan B. Pollack. The Induction of Dynamical Recognizers. Machine Learning, 7(2):227–252, 1991. ISSN 15730565. doi: 10.1023/A:1022651113306.
- Finite State Machines and Recurrent Network Networks – Automata and Dynamical Systems Approaches. 1998.
- Learning Finite State Machines With Self-Clustering Recurrent Networks. Neural Computation, 5(6):976–990, 1993. ISSN 0899-7667. doi: 10.1162/neco.1993.5.6.976.
- K. Arai and R. Nakano. Stable behavior in a recurrent neural network for a finite state machine. Neural Networks, 13(6):667–680, 2000. ISSN 08936080. doi: 10.1016/S0893-6080(00)00037-X.
- Representing formal languages: A comparison between finite automata and recurrent neural networks. 7th International Conference on Learning Representations, ICLR 2019, pages 1–15, 2019.
- Stability of internal states in recurrent neural networks trained on regular languages. Neurocomputing, 452(1):212–223, 2021. ISSN 18728286. doi: 10.1016/j.neucom.2021.04.058.
- Understanding robust generalization in learning regular languages. In International Conference on Machine Learning, 2022.
- Generating Coherent Patterns of Activity from Chaotic Neural Networks. Neuron, 63(4):544–557, 2009. ISSN 08966273. doi: 10.1016/j.neuron.2009.07.018. URL http://dx.doi.org/10.1016/j.neuron.2009.07.018.
- Opening the black box: Low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Computation, 25(3):626–649, 2013. ISSN 08997667. doi: 10.1162/NECO_a_00409.
- Excitable networks for finite state computation with continuous time recurrent neural networks. arXiv, 2020. ISSN 23318422.
- Spiking neural networks modelled as timed automata: with parameter learning. Natural Computing, 19(1):135–155, 2020. ISSN 15729796. doi: 10.1007/s11047-019-09727-9. URL https://doi.org/10.1007/s11047-019-09727-9.
- Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 5:157–66, 02 1994. doi: 10.1109/72.279181.
- Beyond exploding and vanishing gradients: analysing rnn training using attractors and smoothness. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)., volume 108. PMLR, 2020.
- Multiscale deep bidirectional gated recurrent neural networks based prognostic method for complex non-linear degradation systems. Information Sciences, 554, 2021. ISSN 00200255. doi: 10.1016/j.ins.2020.12.032.
- Alberto Bemporad. Recurrent Neural Network Training with Convex Loss and Regularization Functions by Extended Kalman Filtering. (2017), 2021. URL http://arxiv.org/abs/2111.02673.
- A mixed deep recurrent neural network for MEMS gyroscope noise suppressing. Electronics (Switzerland), 8(2), 2019. ISSN 20799292. doi: 10.3390/electronics8020181.
- Hyun Su Kim. Development of seismic response simulation model for building structures with semi-active control devices using recurrent neural network. Applied Sciences (Switzerland), 10(11), 2020. ISSN 20763417. doi: 10.3390/app10113915.
- Reaction engineering with recurrent neural network: Kinetic study of Dushman reaction. Chemical Engineering Journal Advances, 9, 2022. ISSN 26668211. doi: 10.1016/j.ceja.2021.100219.
- Physics-Guided and Physics-Explainable Recurrent Neural Network for Time Dynamics in Optical Resonances. sep 2021. URL http://arxiv.org/abs/2109.09837.
- Warming-up recurrent neural networks to maximize reachable multi-stability greatly improves learning. 2021. URL http://arxiv.org/abs/2106.01001.
- Kenji Doya. Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks, 1993.
- On the difficulty of training recurrent neural networks. 30th International Conference on Machine Learning, ICML 2013, (PART 3):2347–2355, 2013.
- Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. In Neural Information Processing Systems, 2021.
- Pytorch: An imperative style, high-performance deep learning library. ArXiv, abs/1912.01703, 2019.
- Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
- Alex Graves. Generating sequences with recurrent neural networks. ArXiv, abs/1308.0850, 2013.
- Fixedpointfinder: A tensorflow toolbox for identifying and characterizing fixed points in recurrent neural networks. J. Open Source Softw., 3:1003, 2018.