When and How: Learning Identifiable Latent States for Nonstationary Time Series Forecasting (2402.12767v3)
Abstract: Temporal distribution shifts are ubiquitous in time series data. One of the most popular methods assumes that the temporal distribution shift occurs uniformly to disentangle the stationary and nonstationary dependencies. But this assumption is difficult to meet, as we do not know when the distribution shifts occur. To solve this problem, we propose to learn IDentifiable latEnt stAtes (IDEA) to detect when the distribution shifts occur. Beyond that, we further disentangle the stationary and nonstationary latent states via sufficient observation assumption to learn how the latent states change. Specifically, we formalize the causal process with environment-irrelated stationary and environment-related nonstationary variables. Under mild conditions, we show that latent environments and stationary/nonstationary variables are identifiable. Based on these theories, we devise the IDEA model, which incorporates an autoregressive hidden Markov model to estimate latent environments and modular prior networks to identify latent states. The IDEA model outperforms several latest nonstationary forecasting methods on various benchmark datasets, highlighting its advantages in real-world scenarios.
- Identifiability of parameters in latent structure models with many observed variables. 2009.
- An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
- Accurate medium-range global weather forecasting with 3d neural networks. Nature, 619(7970):533–538, 2023.
- Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statistical Association, 65(332):1509–1526, 1970.
- Time series domain adaptation via sparse associative structure alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 6859–6867, 2021.
- Chatfield, C. Time-series forecasting. CRC press, 2000.
- Comon, P. Independent component analysis, a new concept? Signal processing, 36(3):287–314, 1994.
- Contrastive domain adaptation for time-series via temporal mixup. IEEE Transactions on Artificial Intelligence, 2023.
- Viterbi-based estimation for markov switching garch model. Applied Mathematical Finance, 19(3):219–231, 2012.
- Variational recurrent auto-encoders. arXiv preprint arXiv:1412.6581, 2014.
- Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33:1474–1487, 2020.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021a.
- Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems, 34:572–585, 2021b.
- On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems, 35:35971–35983, 2022.
- Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In Conference on Uncertainty in Artificial Intelligence, pp. 939–948. PMLR, 2020.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Forecasting: principles and practice. OTexts, 2018.
- Hyvärinen, A. Independent component analysis: recent advances. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1984):20110534, 2013.
- Unsupervised feature extraction by time-contrastive learning and nonlinear ica. Advances in neural information processing systems, 29, 2016.
- Nonlinear ica of temporally dependent stationary sources. In Artificial Intelligence and Statistics, pp. 460–469. PMLR, 2017.
- Nonlinear independent component analysis: Existence and uniqueness results. Neural networks, 12(3):429–439, 1999.
- Nonlinear ica using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 859–868. PMLR, 2019.
- Identifiability of latent-variable and structural-equation models: from linear to nonlinear. arXiv preprint arXiv:2302.02672, 2023.
- Witran: Water-wave information transmission and recurrent acceleration network for long-range time series forecasting. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pp. 2207–2217. PMLR, 2020a.
- Ice-beem: Identifiable conditional energy-based deep models based on nonlinear ica. Advances in Neural Information Processing Systems, 33:12768–12778, 2020b.
- Variational temporal abstraction. Advances in Neural Information Processing Systems, 32, 2019.
- Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Partial disentanglement for domain adaptation. In International Conference on Machine Learning, pp. 11455–11472. PMLR, 2022.
- Identification of nonlinear latent hierarchical models. arXiv preprint arXiv:2306.07916, 2023a.
- Understanding masked autoencoders via hierarchical latent variable models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7918–7928, 2023b.
- Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018.
- Kruskal, J. B. More factors than subjects, tests and treatments: an indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41:281–293, 1976.
- Kruskal, J. B. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear algebra and its applications, 18(2):95–138, 1977.
- Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848, 2017.
- Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pp. 95–104, 2018.
- Independent component analysis. Springer, 1998.
- Subspace identification for multi-source domain adaptation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a. URL https://openreview.net/forum?id=BACQLWQW8u.
- Transferable time-series forecasting under causal conditional shift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
- Identifying semantic component for robust molecular property prediction. arXiv preprint arXiv:2311.04837, 2023c.
- Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021.
- icitris: Causal representation learning for instantaneous temporal effects. arXiv preprint arXiv:2206.06169, 2022a.
- Citris: Causal identifiability from temporal intervened sequences. In International Conference on Machine Learning, pp. 13557–13603. PMLR, 2022b.
- Non-stationary transformers: Exploring the stationarity in time series forecasting. Advances in Neural Information Processing Systems, 35:9881–9893, 2022.
- itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625, 2023a.
- Koopa: Learning non-stationary time series dynamics with koopman predictors. arXiv preprint arXiv:2305.18803, 2023b.
- Adaptive normalization for non-stationary time series forecasting: A temporal slice perspective. In Thirty-seventh Conference on Neural Information Processing Systems, 2023c.
- Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning, pp. 4114–4124. PMLR, 2019a.
- Disentangling factors of variation using few labels. arXiv preprint arXiv:1905.01258, 2019b.
- The m4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1):54–74, 2020.
- A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
- Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9242–9250, 2021.
- Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018.
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
- Nonstationary time series transformation methods: An experimental review. Knowledge-Based Systems, 164:274–291, 2019.
- Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
- Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied soft computing, 90:106181, 2020.
- Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022.
- Temporally disentangled representation learning under unknown nonstationarity. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Inference via low-dimensional couplings. The Journal of Machine Learning Research, 19(1):2639–2709, 2018.
- Surana, A. Koopman operator framework for time series modeling and analysis. Journal of Nonlinear Science, 30:1973–2006, 2020.
- On disentangled representations learned from correlated data. In International Conference on Machine Learning, pp. 10401–10412. PMLR, 2021.
- Nonstationarity and data preprocessing for neural network predictions of an economic time series. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, volume 5, pp. 129–134. IEEE, 2000.
- Micn: Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2022.
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
- Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv preprint arXiv:2210.02186, 2022.
- Interpretable weather forecasting for worldwide stations with a unified deep model. Nature Machine Intelligence, pp. 1–10, 2023.
- Multi-domain image generation and translation with identifiability guarantees. In The Eleventh International Conference on Learning Representations, 2022.
- Counterfactual generation with identifiability guarantees. In 37th International Conference on Neural Information Processing Systems, NeurIPS 2023, 2023.
- Learning temporally causal latent processes from general temporal data. arXiv preprint arXiv:2110.05428, 2021.
- Temporally disentangled representation learning. Advances in Neural Information Processing Systems, 35:26492–26503, 2022.
- Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp. 11121–11128, 2023.
- Zhang, G. P. Time series forecasting using a hybrid arima and neural network model. Neurocomputing, 50:159–175, 2003.
- Kernel-based nonlinear independent component analysis. In International Conference on Independent Component Analysis and Signal Separation, pp. 301–308. Springer, 2007.
- On the identifiability of nonlinear ica with unconditional priors. In ICLR2022 Workshop on the Elements of Reasoning: Objects, Structure and Causality, 2022.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 11106–11115, 2021.
- S3vae: Self-supervised sequential vae for representation disentanglement and data generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6538–6547, 2020.