Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rough Transformers for Continuous and Efficient Time-Series Modelling (2403.10288v1)

Published 15 Mar 2024 in stat.ML, cs.AI, and cs.LG

Abstract: Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In such contexts, traditional sequence-based recurrent models struggle. To overcome this, researchers replace recurrent architectures with Neural ODE-based models to model irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of moderate lengths and greater. To mitigate this, we introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences and incurs significantly reduced computational costs, critical for addressing long-range dependencies common in medical contexts. In particular, we propose multi-view signature attention, which uses path signatures to augment vanilla attention and to capture both local and global dependencies in input data, while remaining robust to changes in the sequence length and sampling frequency. We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models using a fraction of the computational time and memory resources on synthetic and real-world time-series tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Imanol Perez Arribas. Derivatives pricing using signature payoffs. arXiv preprint arXiv:1809.09466, 2018.
  2. Dynamic portfolio cuts: A spectral approach to graph-theoretic diversification. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  5468–5472. IEEE, 2022.
  3. Deep attentive survival analysis in limit order books: Estimating fill probabilities with convolutional-transformers. Quantitative Finance, pp.  1–23, 2024.
  4. Combining continuous smartphone native sensors data capture and unsupervised data mining techniques for behavioral changes detection: a case series of the evidence-based behavior (eb2) study. JMIR mHealth and uHealth, 6(12):e9472, 2018.
  5. Beyond u: Making diffusion models faster & lighter. arXiv preprint arXiv:2310.20092, 2023.
  6. Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities. ACM Computing Surveys (CSUR), 54(4):1–40, 2021.
  7. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  8. Diffuser: efficient transformers with multi-hop attention diffusion for long sequences. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  12772–12780, 2023.
  9. Approximation of dynamical systems by continuous time recurrent neural networks. Neural networks, 6(6):801–806, 1993.
  10. Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics, pp.  109–167, 2010.
  11. Multiscaled randomness: A possible source of 1/f noise in biology. Physical review E, 54(2):2154, 1996.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  13. Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems, 33:6696–6707, 2020.
  14. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
  15. Differential equations driven by rough paths. Springer, 2007.
  16. Sleep activity recognition and characterization from multi-source passively sensed data. arXiv preprint arXiv:2301.10156, 2023.
  17. Causal transformer for estimating counterfactual outcomes. In International Conference on Machine Learning, pp.  15293–15329. PMLR, 2022.
  18. State-dependent hawkes processes and their application to limit order book modelling. Quantitative Finance, 22(3):563–583, 2022.
  19. Deepvol: Volatility forecasting from high-frequency data with dilated causal convolutions. arXiv preprint arXiv:2210.04797, 2022.
  20. Heterogeneous hidden markov models for sleep activity recognition from multi-source passively sensed data. arXiv preprint arXiv:2211.10371, 2022.
  21. Deep autoregressive models with spectral attention. Pattern Recognition, 133:109014, 2023.
  22. Neural rough differential equations for long time series. In International Conference on Machine Learning, pp.  7829–7838. PMLR, 2021.
  23. Transformer neural processes: Uncertainty-aware meta learning via sequence modeling. arXiv preprint arXiv:2207.04179, 2022.
  24. Climax: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343, 2023.
  25. Neural ode processes. arXiv preprint arXiv:2103.12413, 2021.
  26. Real-world feasibility and acceptability of real-time suicide risk monitoring via smartphones: A 6-month follow-up cohort. Journal of Psychiatric Research, 149:145–154, 2022.
  27. Diffusion decision model: Current issues and history. Trends in cognitive sciences, 20(4):260–281, 2016.
  28. Hidden markov models for activity detection in atrial fibrillation electrograms. In 2020 Computing in Cardiology, pp.  1–4. IEEE, 2020.
  29. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019.
  30. Nonstationary portfolios: Diversification in the spectral domain. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  5155–5159. IEEE, 2021.
  31. Continuous-time modeling of counterfactual outcomes using neural controlled differential equations. arXiv preprint arXiv:2206.08311, 2022.
  32. Monash university, uea, ucr time series extrinsic regression archive. arXiv preprint arXiv:2006.10996, 2020.
  33. Applying deep learning to single-trial eeg data provides evidence for complementary theories on action control. Communications biology, 3(1):112, 2020.
  34. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  35. Time-series generative adversarial networks. Advances in neural information processing systems, 32, 2019.
  36. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.  11121–11128, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Fernando Moreno-Pino (9 papers)
  2. Harrison Waldon (2 papers)
  3. Xiaowen Dong (84 papers)
  4. Álvaro Cartea (15 papers)
  5. Álvaro Arroyo (6 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com