Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals (2405.17766v1)

Published 28 May 2024 in cs.LG, cs.AI, and eess.SP

Abstract: Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15016–15027, 2023.
  2. The AASM manual for the scoring of sleep and associated events. Rules, Terminology and Technical Specifications, Darien, Illinois, American Academy of Sleep Medicine, 176:2012, 2012.
  3. Automatic signal abnormality detection using time-frequency features and machine learning: A newborn EEG seizure case study. Knowledge-Based Systems, 106:38–50, 2016.
  4. Making the most of text semantics to improve biomedical vision–language processing. In European conference on computer vision, pp.  1–21. Springer, 2022.
  5. Age estimation from sleep studies using deep learning predicts life expectancy. NPJ digital medicine, 5(1):103, 2022.
  6. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  7. You snooze, you win: the physionet/computing in cardiology challenge 2018. In 2018 Computing in Cardiology Conference (CinC), volume 45, pp.  1–4. IEEE, 2018.
  8. 3kg: Contrastive learning of 12-lead electrocardiograms using physiologically-inspired augmentations. In Machine Learning for Health, pp.  156–167. PMLR, 2021.
  9. Convolutional neural networks on multiple respiratory channels to detect hypopnea and obstructive apnea events. In 2018 International Joint Conference on Neural Networks (IJCNN), pp.  1–7. IEEE, 2018.
  10. Automated identification of sleep states from EEG signals by means of ensemble empirical mode decomposition and random under sampling boosting. Computer methods and programs in biomedicine, 140:201–210, 2017.
  11. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9729–9738, 2020.
  12. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3942–3951, 2021.
  13. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  14. Clocs: Contrastive learning of cardiac signals across space, time, and patients. In International Conference on Machine Learning, pp.  5606–5615. PMLR, 2021.
  15. Principles and practice of sleep medicine fifth edition, 2010.
  16. ECG representation learning with multi-modal EHR data. Transactions on Machine Learning Research, 2023.
  17. Living to dream—reply. JAMA neurology, 78(4):495–496, 2021.
  18. A deep learning method approach for sleep stage classification with eeg spectrogram. International Journal of Environmental Research and Public Health, 19(10):6322, 2022.
  19. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19764–19775, 2023.
  20. Cascaded lstm recurrent neural network for automated sleep stage classification using single-channel eeg signals. Computers in biology and medicine, 106:71–81, 2019.
  21. Multi-objective hyperparameter optimization of convolutional neural network for obstructive sleep apnea detection. IEEE Access, 8:129586–129599, 2020.
  22. Sleepeegnet: Automated sleep stage scoring with sequence to sequence deep learning approach. PloS one, 14(5):e0216456, 2019.
  23. Automated scoring of respiratory events in sleep with a single effort belt and deep neural networks. IEEE transactions on biomedical engineering, 69(6):2094–2104, 2021.
  24. CSLP-AE: A contrastive split-latent permutation autoencoder framework for zero-shot electroencephalography signal conversion. arXiv preprint arXiv:2311.07788, 2023.
  25. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  26. Electrocardiographic deep learning for predicting post-procedural mortality. arXiv preprint arXiv:2205.03242, 2022.
  27. U-sleep: Resilient high-frequency sleep staging. NPJ digital medicine, 4 (1), 72, 2021.
  28. Seqsleepnet: end-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(3):400–410, 2019.
  29. Xsleepnet: Multi-view sequential model for automatic sleep staging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5903–5915, 2021.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  31. Contrastive pre-training for multimodal medical time series. In NeurIPS 2022 Workshop on Learning from Time Series for Health, 2022.
  32. Detection of sleep apnea using machine learning algorithms based on ECG signals: A comprehensive systematic review. Expert Systems with Applications, 187:115950, 2022.
  33. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4510–4520, 2018.
  34. Intra-and inter-epoch temporal context network (iitnet) using sub-epoch features for automatic sleep scoring on raw single-channel eeg. Biomedical signal processing and control, 61:102037, 2020.
  35. A convolutional neural network for sleep stage scoring from raw single-channel EEG. Biomedical Signal Processing and Control, 42:107–114, 2018.
  36. Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy. Nature communications, 9(1):5229, 2018.
  37. Deepsleepnet: A model for automatic sleep stage scoring based on raw single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(11):1998–2008, 2017.
  38. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp.  6105–6114. PMLR, 2019.
  39. Automated sleep apnea detection from cardio-pulmonary signal using bivariate fast and adaptive emd coupled with cross time–frequency analysis. Computers in Biology and Medicine, 120:103769, 2020.
  40. Automatic sleep stage scoring with single-channel EEG using convolutional neural networks. arxiv 2016. arXiv preprint arXiv:1610.01683.
  41. Automatic sleep stage scoring using time-frequency analysis and stacked sparse autoencoders. Annals of biomedical engineering, 44:1587–1597, 2016.
  42. Automatic detection of sleep-disordered breathing events using recurrent neural networks from an electrocardiogram signal. Neural computing and applications, 32:4733–4742, 2020.
  43. Worley, S. L. The extraordinary importance of sleep: the detrimental effects of inadequate sleep on health and public safety drive an explosion of sleep research. Pharmacy and Therapeutics, 43(12):758, 2018.
  44. Respiratory event detection during sleep using electrocardiogram and respiratory related signals: Using polysomnogram and patch-type wearable device data. IEEE Journal of Biomedical and Health Informatics, 26(2):550–560, 2021.
  45. A deep learning model for automated sleep stages classification using PSG signals. International journal of environmental research and public health, 16(4):599, 2019.
  46. A sleep apnea-hypopnea syndrome automatic detection and subtype classification method based on LSTM-CNN. Biomedical Signal Processing and Control, 71:103240, 2022.
  47. MSED: A multi-modal sleep event detection model for clinical sleep analysis. IEEE Transactions on Biomedical Engineering, 2023.
  48. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pp.  2–25. PMLR, 2022.
  49. Classification of sleep apnea based on EEG sub-band signal characteristics. Scientific Reports, 11(1):5824, 2021.
Citations (7)

Summary

  • The paper introduces SleepFM, integrating brain, ECG, and respiratory signals using pairwise and leave-one-out contrastive learning to enhance sleep analysis.
  • The model achieves robust results with a macro AUROC of 0.88 for sleep stage classification and 0.85 for SDB detection, outperforming traditional CNNs.
  • The study highlights SleepFM's versatility in demographic prediction and few-shot learning, paving the way for more efficient, automated clinical workflows.

Multi-modal Representation Learning for Sleep: An Expert Overview

"SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals" presents a comprehensive paper leveraging multi-modal data obtained from polysomnography (PSG). The paper introduces SleepFM, a foundation model trained on a substantial dataset consisting of over 100,000 hours of multi-modal sleep recordings from over 14,000 participants. This paper focuses on enhancing sleep analysis using advanced machine learning techniques, particularly multi-modal contrastive learning (CL).

Background and Related Work

Sleep monitoring remains vital for diagnosing sleep disorders and assessing overall health across various physiological systems. Traditionally, sleep data analysis required manual inspection, which is labor-intensive and error-prone. Recent developments in supervised deep learning have automated many of these tasks, yet they largely focus on single-modality data and limited task-specific labels.

Contrastive learning has gained attention for its effectiveness in representation learning by maximizing alignment between data modalities. Previous efforts predominantly applied CL to uni-modal data, such as images or time-series ECG signals. Multi-modal CL, especially for diverse physiological data in sleep studies, has not been widely explored until the introduction of SleepFM.

Methodology

The dataset used in this paper was obtained from the Stanford Sleep Clinic, encompassing a wide age range of participants (2-91 years) and spanning various sleep measurement channels, including brain activity signals (EEG, EOG, EMG), ECG, and respiratory channels. The preprocessing approach involved segmenting sleep recordings into 30-second clips standardized at 256 Hz, with expert-labeled annotations for sleep stages and sleep-disordered breathing (SDB).

The SleepFM model utilizes three 1D convolutional neural networks (CNNs) to encode BAS, ECG, and respiratory data independently. Two contrastive learning frameworks, pairwise CL and leave-one-out CL, were employed to integrate these multi-modal embeddings.

Pairwise CL constructs contrastive tasks between each pair of modalities, pushing embeddings of matching clips closer while penalizing mismatched pairs.

Leave-one-out CL contrasts embeddings of each modality against the average embedding of the remaining modalities, promoting comprehensive multi-modal alignment.

Experimental Results

The paper presented multiple evaluations:

  1. Demographic Attributes Classification:
    • Using logistic regression on SleepFM embeddings yielded superior performance in predicting age and gender compared to an end-to-end CNN. Notably, leave-one-out CL achieved the highest accuracy, reflecting the effective capture of demographic information from short PSG clips.
  2. Retrieval Analysis:
    • SleepFM showed high efficacy in retrieving corresponding modality clips using another modality's embeddings. Specifically, pairwise CL excelled in retrieval tasks, further validating the rich and distinct representations learned.
  3. Downstream Classification Tasks:
    • In sleep stage and SDB event classification, SleepFM outperformed traditional CNNs, especially models pre-trained with leave-one-out CL. For sleep stages, it achieved a macro AUROC of 0.88 versus 0.72 for CNN, and for SDB detection, an AUROC of 0.85 versus 0.69.
  4. Few-Shot Learning:
    • SleepFM demonstrated strong performance even with limited training data, significantly outperforming CNNs across varying sample sizes. The evaluation confirmed the robustness and utility of the pre-trained embeddings for practical, low-data settings.
  5. Multi-Modal Pretraining Benefits:
    • Ablation studies highlighted the superior performance of models trained on three modalities compared to those trained on single or paired modalities. The integrative approach significantly enhanced downstream task efficacy, underscoring the importance of comprehensive multi-modal training.
  6. External Validation:
    • The model generalized well to an external dataset, exceeding the performance of a locally trained supervised CNN. This external validation underscores SleepFM's adaptability and potential applicability across diverse clinical settings.

Discussion and Implications

The success of SleepFM in leveraging multi-modal PSG data suggests profound implications for automating and enhancing sleep analysis. The paper highlights the importance of holistic data integration, demonstrating that joint modeling of BAS, ECG, and respiratory signals captures the complexity of physiological dynamics during sleep.

Practically, these findings can streamline clinical workflows, reducing manual annotation time and potential errors. Theoretically, the model’s ability to extract meaningful representations from large-scale unlabeled data opens avenues for future developments in foundation models for other physiological domains.

Future research should focus on extending multi-site and multi-modal pretraining, handling missing data, and exploring other sleep-related tasks. Potential investigations into other self-supervised learning methods might further optimize performance for specific applications.

The paper acknowledges its current limitations, particularly the need for broader validation across various clinical settings and data acquisition protocols. Addressing these limitations will be crucial for transitioning research findings into robust, real-world solutions.

In conclusion, the introduction of SleepFM marks a significant advancement in sleep paper methodologies, emphasizing the strength of multi-modal representation learning in capturing the intricacies of sleep physiology. This work not only demonstrates practical improvements over existing methods but also sets the stage for future innovations in the field.