Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 103 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 27 tok/s
GPT-5 High 37 tok/s Pro
GPT-4o 92 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 241 tok/s Pro
2000 character limit reached

Feature-aligned N-BEATS with Sinkhorn divergence (2305.15196v3)

Published 24 May 2023 in cs.LG, cs.AI, math.OC, and math.PR

Abstract: We propose Feature-aligned N-BEATS as a domain-generalized time series forecasting model. It is a nontrivial extension of N-BEATS with doubly residual stacking principle (Oreshkin et al. [45]) into a representation learning framework. In particular, it revolves around marginal feature probability measures induced by the intricate composition of residual and feature extracting operators of N-BEATS in each stack and aligns them stack-wise via an approximate of an optimal transport distance referred to as the Sinkhorn divergence. The training loss consists of an empirical risk minimization from multiple source domains, i.e., forecasting loss, and an alignment loss calculated with the Sinkhorn divergence, which allows the model to learn invariant features stack-wise across multiple source data sequences while retaining N-BEATS's interpretable design and forecasting power. Comprehensive experimental evaluations with ablation studies are provided and the corresponding results demonstrate the proposed model's forecasting and generalization capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Generalizing to unseen domains via distribution matching. arXiv preprint arXiv:1911.00804, 2019.
  2. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005.
  3. The tourism forecasting competition. International Journal of Forecasting, 27(3):822–844, 2011.
  4. Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert Systems with Applications, 140:112896, 2020.
  5. H. Bao and S. Sakaue. Sparse regularized optimal transport with deformed q-entropy. Entropy, 24(11):1634, 2022.
  6. A theory of learning from different domains. Machine Learning, 79:151–175, 2010.
  7. Analysis of representations for domain adaptation. Advances in Neural Information Processing Systems, 19, 2006.
  8. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
  9. Brits: Bidirectional recurrent imputation for time series. Advances in Neural Information Processing Systems, 31, 2018.
  10. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6989–6997, 2023.
  11. Probabilistic forecasting with temporal convolutional neural network. Neurocomputing, 399:491–501, 2020.
  12. Faster wasserstein distance estimation with the sinkhorn divergence. Advances in Neural Information Processing Systems, 33:2257–2269, 2020.
  13. S. Di Marino and A. Gerolin. Optimal transport losses and sinkhorn algorithm with general convex regularization. arXiv preprint arXiv:2007.00976, 2020.
  14. R. M. Dudley. The speed of mean glivenko-cantelli convergence. The Annals of Mathematical Statistics, 40(1):40–50, 1969.
  15. H. Federer. Geometric measure theory. Classics in Mathematics. Springer, 2014.
  16. J. Feydy. Geometric data analysis, beyond convolutions. Applied Mathematics, 2020.
  17. Interpolating between optimal transport and mmd using sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2681–2690. PMLR, 2019.
  18. Pot: Python optimal transport. Journal of Machine Learning Research, 22(1):3571–3578, 2021.
  19. Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning, pages 1180–1189. PMLR, 2015.
  20. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35, 2016.
  21. Learning generative models with sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pages 1608–1617. PMLR, 2018.
  22. Scatter component analysis: A unified framework for domain adaptation and domain generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7):1414–1430, 2016.
  23. A kernel two-sample test. Journal of Machine Learning Research, 13(1):723–773, 2012.
  24. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting, 37(1):388–427, 2021.
  25. Datsing: Data augmented time series forecasting with adversarial domain adaptation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 2061–2064, 2020.
  26. Domain adaptation for time series forecasting via attention sharing. In International Conference on Machine Learning, pages 10280–10297. PMLR, 2022.
  27. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  28. Deep multi-wasserstein unsupervised domain adaptation. Pattern Recognition Letters, 125:249–255, 2019.
  29. Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10285–10295, 2019.
  30. Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5400–5409, 2018.
  31. Extracting relationships by multi-domain matching. Advances in Neural Information Processing Systems, 31, 2018.
  32. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision, pages 624–639, 2018.
  33. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748–1764, 2021.
  34. Scinet: Time series modeling and forecasting with sample convolution and interaction. Advances in Neural Information Processing Systems, 35:5816–5828, 2022.
  35. U-net inspired transformer architecture for far horizon time series forecasting. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 36–52. Springer, 2022.
  36. S. Makridakis and M. Hibon. The m3-competition: results, conclusions and implications. International Journal of Forecasting, 16(4):451–476, 2000.
  37. The m4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34(4):802–808, 2018.
  38. T. Matsuura and T. Harada. Domain generalization using a mixture of multiple latent domains. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 11749–11756, 2020.
  39. Umap: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29):861, 2018.
  40. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
  41. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, pages 5715–5725, 2017.
  42. Domain generalization via invariant feature representation. In International Conference on Machine Learning, pages 10–18. PMLR, 2013.
  43. V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning, pages 807–814, 2010.
  44. Exploiting mmd and sinkhorn divergences for fair and transferable representation learning. Advances in Neural Information Processing Systems, 33:15360–15370, 2020.
  45. N-beats: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2019.
  46. Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9242–9250, 2021.
  47. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310–1318. Pmlr, 2013.
  48. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  49. Dataset shift in machine learning. Mit Press, 2008.
  50. On wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2):47, 2017.
  51. Deep state space models for time series forecasting. Advances in Neural Information Processing Systems, 31, 2018.
  52. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  53. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  54. A novel domain adaptation theory with jensen–shannon divergence. Knowledge-Based Systems, 257:109808, 2022.
  55. On the benefits of representation regularization in invariance based domain generalization. Machine Learning, 111(3):895–915, 2022.
  56. V. Vapnik. Principles of risk minimization for learning theory. Advances in Neural Information Processing Systems, 4, 1991.
  57. A. Virmaux and K. Scaman. Lipschitz regularity of deep neural networks: analysis and efficient estimation. Advances in Neural Information Processing Systems, 31, 2018.
  58. Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
  59. M. Wang and W. Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
  60. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022.
  61. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
  62. Cot-gan: Generating sequential data via causal optimal transport. Advances in Neural Information Processing Systems, 33:8798–8809, 2020.
  63. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11121–11128, 2023.
  64. On learning invariant representations for domain adaptation. In International Conference on Machine Learning, pages 7523–7532. PMLR, 2019.
  65. Domain generalization via entropy regularization. Advances in Neural Information Processing Systems, 33:16096–16107, 2020.
  66. Domain generalization via optimal transport with metric similarity learning. Neurocomputing, 456:469–480, 2021.
  67. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11106–11115, 2021.
  68. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pages 27268–27286. PMLR, 2022.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run paper prompts using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube