Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 82 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning (2402.09558v3)

Published 14 Feb 2024 in cs.AI and cs.LG

Abstract: Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Classification of 12-lead ecgs: the physionet/computing in cardiology challenge. Physiological measurement, 41:124003, 2020. doi: 10.1088/1361-6579/abc960.
  2. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical review. E, Statistical, nonlinear, and soft matter physics, 6:061907, 2001.
  3. A neural probabilistic language model. In Leen, T., Dietterich, T., and Tresp, V. (eds.), Advances in Neural Information Processing Systems, volume 13. MIT Press, 2000. URL https://proceedings.neurips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf.
  4. Low-rank bottleneck in multi-head attention models. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  864–873. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/bhojanapalli20a.html.
  5. BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  6. Attention is not all you need: pure attention loses rank doubly exponentially with depth. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  2793–2803. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/dong21a.html.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  8. Time-series representation learning via temporal and contextual contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp.  2352–2359, 2021.
  9. pyts: A python package for time series classification. Journal of Machine Learning Research, 21(46):1–6, 2020. URL http://jmlr.org/papers/v21/19-763.html.
  10. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, 2000. doi: 0.1161/01.cir.101.23.e215.
  11. Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp.  392–401, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450329569. doi: 10.1145/2623330.2623613. URL https://doi.org/10.1145/2623330.2623613.
  12. Conformer: Convolution-augmented transformer for speech recognition. pp.  5036–5040, 10 2020. doi: 10.21437/Interspeech.2020-3015.
  13. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg. IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000. doi: 10.1109/10.867928.
  14. A survey on time-series pre-trained models, 2023.
  15. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023.
  16. Deep contextualized word representations. In Walker, M., Ji, H., and Stent, A. (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp.  2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1202. URL https://aclanthology.org/N18-1202.
  17. XSleepNet: Multi-view sequential model for automatic sleep staging. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.  1–1, 2021. doi: 10.1109/tpami.2021.3070057. URL https://doi.org/10.1109%2Ftpami.2021.3070057.
  18. Toward a robust estimation of respiratory rate from pulse oximeters. IEEE Transactions on Biomedical Engineering, 64(8):1914–1923, 2017. doi: 10.1109/TBME.2016.2613124.
  19. Improving language understanding by generative pre-training. 2018. URL https://api.semanticscholar.org/CorpusID:49313245.
  20. Language models are unsupervised multitask learners. 2019.
  21. Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  8748–8763. PMLR, 2021. URL http://proceedings.mlr.press/v139/radford21a.html.
  22. Deep ppg: Large-scale heart rate estimation with convolutional neural networks. Sensors, 19(14):3079, 2019a. doi: 10.3390/s19143079.
  23. Deep ppg: Large-scale heart rate estimation with convolutional neural networks. Sensors, 19(14), 2019b. ISSN 1424-8220. doi: 10.3390/s19143079. URL https://www.mdpi.com/1424-8220/19/14/3079.
  24. Timelygpt: Recurrent convolutional transformer for long time-series representation, 2023.
  25. Seizure forecasting and cyclic control of seizures. Epilepsia, 62 Suppl 1, 07 2020. doi: 10.1111/epi.16541.
  26. Retentive network: A successor to transformer for large language models, 2023a.
  27. A length-extrapolatable transformer. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  14590–14604, Toronto, Canada, July 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.816. URL https://aclanthology.org/2023.acl-long.816.
  28. Time series extrinsic regression. Data Mining and Knowledge Discovery, pp.  1–29, 2021. doi: https://doi.org/10.1007/s10618-021-00745-9.
  29. Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  30. Investigating the saliency of sentiment expressions in aspect-based sentiment analysis. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.  12751–12769, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.807. URL https://aclanthology.org/2023.findings-acl.807.
  31. Linformer: Self-attention with linear complexity, 2020.
  32. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. CoRR, abs/2106.13008, 2021. URL https://arxiv.org/abs/2106.13008.
  33. Timesnet: Temporal 2d-variation modeling for general time series analysis. In International Conference on Learning Representations, 2023.
  34. Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp.  947–956, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605584959. doi: 10.1145/1557019.1557122. URL https://doi.org/10.1145/1557019.1557122.
  35. Ts2vec: Towards universal representation of time series. Proceedings of the AAAI Conference on Artificial Intelligence, 36:8980–8987, Jun. 2022. doi: 10.1609/aaai.v36i8.20881. URL https://ojs.aaai.org/index.php/AAAI/article/view/20881.
  36. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp.  2114–2124, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467401. URL https://doi.org/10.1145/3447548.3467401.
  37. Self-supervised time series representation learning via cross reconstruction transformer. IEEE Transactions on Neural Networks and Learning Systems, pp.  1–10, 2023. doi: 10.1109/TNNLS.2023.3292066.
  38. Self-supervised contrastive pre-training for time series via time-frequency consistency. In Proceedings of Neural Information Processing Systems, NeurIPS, 2022a.
  39. Troika: A general framework for heart rate monitoring using wrist-type photoplethysmographic signals during intensive physical exercise. IEEE Transactions on Biomedical Engineering, 62(2):522–531, 2015. doi: 10.1109/TBME.2014.2359372.
  40. Mixhead: Breaking the low-rank bottleneck in multi-head attention language models. Know.-Based Syst., 240(C), mar 2022b. ISSN 0950-7051. doi: 10.1016/j.knosys.2021.108075. URL https://doi.org/10.1016/j.knosys.2021.108075.
  41. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (ICML 2022), 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: