Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach (2307.05126v1)

Published 11 Jul 2023 in cs.LG and math.OC

Abstract: Due to their dynamic properties such as irregular sampling rate and high-frequency sampling, Continuous Time Series (CTS) are found in many applications. Since CTS with irregular sampling rate are difficult to model with standard Recurrent Neural Networks (RNNs), RNNs have been generalised to have continuous-time hidden dynamics defined by a Neural Ordinary Differential Equation (Neural ODE), leading to the ODE-RNN model. Another approach that provides a better modelling is that of the Latent ODE model, which constructs a continuous-time model where a latent state is defined at all times. The Latent ODE model uses a standard RNN as the encoder and a Neural ODE as the decoder. However, since the RNN encoder leads to difficulties with missing data and ill-defined latent variables, a Latent ODE-RNN model has recently been proposed that uses a ODE-RNN model as the encoder instead. Both the Latent ODE and Latent ODE-RNN models are difficult to train due to the vanishing and exploding gradients problem. To overcome this problem, the main contribution of this paper is to propose and illustrate a new model based on a new Latent ODE using an ODE-LSTM (Long Short-Term Memory) network as an encoder -- the Latent ODE-LSTM model. To limit the growth of the gradients the Norm Gradient Clipping strategy was embedded on the Latent ODE-LSTM model. The performance evaluation of the new Latent ODE-LSTM (with and without Norm Gradient Clipping) for modelling CTS with regular and irregular sampling rates is then demonstrated. Numerical experiments show that the new Latent ODE-LSTM performs better than Latent ODE-RNNs and can avoid the vanishing and exploding gradients during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. J. L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990.
  2. R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” vol. 31, 2018.
  3. Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud, “Latent Ordinary Differential Equations for Irregularly-Sampled Time Series,” in Advances in Neural Information Processing Systems, vol. 32, Curran Associates, Inc., 2019.
  4. D. P. Kingma, M. Welling, et al., “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019.
  5. Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, vol. 5, pp. 157–66, Feb. 1994.
  6. S. Hochreiter and J. Schmidhuber, “Long Short-term Memory,” Neural computation, vol. 9, pp. 1735–80, Dec. 1997.
  7. M. Lechner and R. Hasani, “Learning Long-Term Dependencies in Irregularly-Sampled Time Series,” Dec. 2020.
  8. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, vol. 27, 2014.
  9. R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International conference on machine learning, pp. 1310–1318, Pmlr, 2013.
  10. R. Lawrence, “Using Neural Networks to Forecast Stock Market Prices,” University of Manitoba, vol. 333, pp. 2006–2013, 1997.
  11. J. S. Vaiz and M. Ramaswami, “A Hybrid Model to Forecast Stock Trend Using Support Vector Machine and Neural Networks,” International Journal of Engineering Research and Development, vol. 13, pp. 52–59, 2016.
  12. A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, “Phoneme recognition using time-delay neural networks,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, pp. 328–339, Apr. 1989.
  13. L. Pontryagin, Mathematical Theory of Optimal Processes. Routledge, first ed., May 2018.
  14. MIT Press, 2016. http://www.deeplearningbook.org.
  15. D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” May 2014.
  16. G. LLC, “Kaggle,” no. online resource - https://www.kaggle.com, last accessed on 2022/09/26, 2010.
  17. S. V. Rao, “Daily climate time series data,” no. online resource - https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data/code?datasetId=312121&sortBy=voteCount&select=DailyDelhiClimateTrain.csv, last accessed on 2022/09/26, 2019.
  18. szrlee, “Djia 30 stock time series,” no. online resource - https://www.kaggle.com/datasets/szrlee/stock-time-series-20050101-to-20171231?select=AAPL_2006-01-01_to_2018-01-01.csv, last accessed on 2022/09/26, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. C. Coelho (7 papers)
  2. M. Fernanda P. Costa (7 papers)
  3. L. L. Ferrás (7 papers)
Citations (5)