Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data (2402.14989v5)
Abstract: Irregular sampling intervals and missing values in real-world time series data present challenges for conventional methods that assume consistent intervals and complete data. Neural Ordinary Differential Equations (Neural ODEs) offer an alternative approach, utilizing neural networks combined with ODE solvers to learn continuous latent representations through parameterized vector fields. Neural Stochastic Differential Equations (Neural SDEs) extend Neural ODEs by incorporating a diffusion term, although this addition is not trivial, particularly when addressing irregular intervals and missing values. Consequently, careful design of drift and diffusion functions is crucial for maintaining stability and enhancing performance, while incautious choices can result in adverse properties such as the absence of strong solutions, stochastic destabilization, or unstable Euler discretizations, significantly affecting Neural SDEs' performance. In this study, we propose three stable classes of Neural SDEs: Langevin-type SDE, Linear Noise SDE, and Geometric SDE. Then, we rigorously demonstrate their robustness in maintaining excellent performance under distribution shift, while effectively preventing overfitting. To assess the effectiveness of our approach, we conduct extensive experiments on four benchmark datasets for interpolation, forecasting, and classification tasks, and analyze the robustness of our methods with 30 public datasets under different missing rates. Our results demonstrate the efficacy of the proposed method in handling real-world irregular time series data.
- Neural continuous-discrete state space models for irregularly-sampled time series. arXiv preprint arXiv:2301.11308, 2023.
- Stabilization and destabilization of nonlinear differential equations by noise. IEEE Transactions on Automatic Control, 53:683–691, 2008.
- The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075, 2018.
- From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE transactions on medical imaging, 38(2), 2018.
- On stochastic gradient Langevin dynamics with dependent data streams: The fully nonconvex case. SIAM Journal on Mathematics of Data Science, 3(3):959–986, 2021.
- Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(1):6085, 2018.
- Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- Doctor ai: Predicting clinical events via recurrent neural networks. In Machine learning for healthcare conference, pp. 301–318. PMLR, 2016.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Gru-ode-bayes: Continuous modeling of sporadically-observed time series. Advances in neural information processing systems, 32, 2019.
- Augmented neural odes. Advances in Neural Information Processing Systems, 32, 2019.
- Efficient and accurate estimation of lipschitz constants for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Efficient discovery of variable-length time series motifs with large length range in million scale time series. arXiv preprint arXiv:1802.04883, 2018.
- Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in mathematics and statistics, 5(4):349–380, 2017.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Strong and weak divergence in finite time of euler’s method for stochastic differential equations with non-globally lipschitz continuous coefficients. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 467:1563–1576, 2011.
- Ace-node: Attentive co-evolving neural ordinary differential equations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 736–745, 2021.
- Exit: Extrapolation and interpolation-based neural controlled differential equations for time-series classification and forecasting. In Proceedings of the ACM Web Conference 2022, pp. 3102–3112, 2022.
- Learnable path in neural controlled differential equations. arXiv preprint arXiv:2301.04333, 2023a.
- Attentive neural controlled differential equations for time-series classification and forecasting. Knowledge and Information Systems, pp. 1–31, 2023b.
- Neural jump stochastic differential equations. Advances in Neural Information Processing Systems, 32, 2019.
- Eamonn Keogh. Efficiently finding arbitrarily scaled patterns in massive time series databases. In European Conference on Principles of Data Mining and Knowledge Discovery, pp. 253–265. Springer, 2003.
- Rafail Khasminskii. Stochastic Stability of Differential Equations. Springer, 2011.
- Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems, 33:6696–6707, 2020.
- Neural sdes as infinite-dimensional gans. In International Conference on Machine Learning, pp. 5453–5463. PMLR, 2021a.
- Efficient and accurate gradients for neural sdes. Advances in Neural Information Processing Systems, 34:18747–18761, 2021b.
- The pathwise convergence of approximation schemes for stochastic differential equations. LMS journal of Computation and Mathematics, 10:235–253, 2007.
- Sde-net: Equipping deep neural networks with uncertainty estimates. International Conference on Machine Learning, 37, 2020.
- Lipschitz constant estimation of neural networks via sparse polynomial optimization. arXiv preprint arXiv:2004.08688, 2020.
- Mind the gap: Assessing temporal generalization in neural language models. Advances in neural information processing systems, 2021.
- Learning long-term dependencies in irregularly-sampled time series. arXiv preprint arXiv:2006.04418, 2020.
- Multi-view integrative attention-based deep representation learning for irregular clinical time-series data. IEEE Journal of Biomedical and Health Informatics, 2022.
- Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10), 2010.
- Detecting and adapting to irregular distribution shifts in bayesian online learning. Advances in neural information processing systems, 2021.
- Scalable gradients for stochastic differential equations. In International Conference on Artificial Intelligence and Statistics, pp. 3870–3882. PMLR, 2020.
- Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118, 2018.
- Polygonal unadjusted langevin algorithms: Creating stable and efficient adaptive algorithms for neural networks. arXiv preprint arXiv:2105.13937, 2021.
- Non-asymptotic estimates for tusla algorithm for non-convex learning with applications to neural networks with relu activation function. IMA Journal of numerical analysis, 2023a.
- Langevin dynamics based algorithm e-thε𝜀\varepsilonitalic_εo poula for stochastic optimization problems with discontinuous stochastic gradient. arXiv preprint arXiv:2210.13193, 2023b.
- Neural sde: Stabilizing neural ode networks with stochastic noise. arXiv preprint arXiv:1906.02355, 2019.
- sktime: A unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872, 2019.
- X. Mao. Stochastic stabilisation and destabilisation. Systems and Control Letters, 23:279–290, 1994.
- Xuerong Mao. Stochastic differential equations and applications. Elsevier, 2007.
- Ergodicity for sdes and approximations: locally lipschitz vector fields and degenerate noise. Stochastic Processes and Their Applications, 101(2):185–232, 2002.
- Recurrent neural networks: design and applications. CRC press, 1999.
- The effect of natural distribution shift on question answering models. International Conference on Machine Learning, 2020.
- Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. International Conference on Machine Learning, 2021.
- Ray: A distributed framework for emerging {{\{{AI}}\}} applications. In 13th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 18), pp. 561–577, 2018.
- Neural controlled differential equations for online prediction tasks. arXiv preprint arXiv:2106.11028, 2021a.
- Neural rough differential equations for long time series. In International Conference on Machine Learning, pp. 7829–7838. PMLR, 2021b.
- Discrete event, continuous time rnns. arXiv preprint arXiv:1710.04110, 2017.
- Stochastic differential equations. Springer, 2003.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in neural information processing systems, 33, 2019.
- Neural markov controlled sde: Stochastic optimization for continuous-time data. In International Conference on Learning Representations, 2021.
- Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis. In In Conference on Learning Theory, pp. 1674–1703. PMLR, 2017.
- Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In 2019 Computing in Cardiology (CinC), pp. Page–1. IEEE, 2019.
- Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996.
- Andreas Rößler. Runge–kutta methods for stratonovich stochastic differential equation systems with commutative noise. Journal of computational and applied mathematics, 164:613–627, 2004.
- Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019.
- Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
- Multi-time attention networks for irregularly sampled time series. arXiv preprint arXiv:2101.10318, 2021.
- Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In 2012 Computing in Cardiology, pp. 245–248. IEEE, 2012.
- Diffusion for global optimization in rnsuperscript𝑟𝑛r^{n}italic_r start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. SIAM Journal on Control and Optimization, 25(3):737–753, 1987.
- Time series classification for varying length series. arXiv preprint arXiv:1910.04341, 2019.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. IEEE, 2012.
- Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Lipschitz regularity of deep neural networks: analysis and efficient estimation. Advances in Neural Information Processing Systems, 31, 2018.
- Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209, 2018.
- Global convergence of Langevin dynamics based algorithms for nonconvex optimization. Advances in Neural Information Processing Systems, 32, 2018.
- Detecting time series motifs under uniform scaling. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 844–853, 2007.
- A hitting time analysis of stochastic gradient langevin dynamics. In In Conference on Learning Theory, pp. 1980–2022. PMLR, 2017.
- Domain adaptation under missingness shift. International Conference on Artificial Intelligence and Statistics, 2023.