SleepEEGNet: Automated Sleep Stage Annotation
- SleepEEGNet is a deep learning framework that uses CNN and LSTM architectures to classify 30-second EEG epochs into standard sleep stages.
- It integrates a two-stream 1D CNN and sequence-to-sequence attention to extract both temporal and spectral features from single-channel EEG signals.
- The model achieves human-level accuracy and Cohen’s kappa agreement by addressing imbalanced class distributions through custom loss functions and data oversampling.
SleepEEGNet denotes a class of deep learning models tailored for automated sleep stage annotation from EEG, first formalized in the literature as SLEEPNET (Biswal et al., 2017) and subsequently as SleepEEGNet (Mousavi et al., 2019). Both frameworks address the challenge of reliably classifying 30-second epochs of polysomnographic EEG data into standard sleep stages (W, N1, N2, N3, REM) using convolutional and recurrent neural architectures. SLEEPNET exploited spectral-spatial-temporal learning with multi-channel recordings; SleepEEGNet extended these methods to single-channel settings, integrating sequence-to-sequence modeling and class-balanced loss functions to mitigate imbalanced class distributions. These systems achieve human-level accuracy and Cohen’s kappa agreement with expert scorers, supporting scalable sleep diagnostics.
1. Data Acquisition and Preprocessing
Both SLEEPNET and SleepEEGNet process raw polysomnography (PSG) or single-channel EEG signals sampled at either 100 Hz (SleepEEGNet) or 100–200 Hz (SLEEPNET). Standard preprocessing includes band-pass filtering (typically 0.3–35 Hz) and 60 Hz notch filtering to suppress baseline drift and mains-induced noise. The EEG stream is segmented into non-overlapping 30-second epochs for scoring, resulting in vectors of samples per epoch (e.g., for SLEEPNET, for SleepEEGNet).
Epochs are normalized by zero-mean, unit-variance scaling—either per-channel and per-night (SLEEPNET) or directly at the channel level (SleepEEGNet). SLEEPNET generates spectrograms via short-time Fourier transform (STFT) per epoch:
with window sizes , hop size 50%, and Hamming window , yielding multi-channel spectrograms (size 128 × 128). SleepEEGNet directly leverages raw time series and spectral features using two 1D convolutional branches.
2. Model Architectures
Both approaches are characterized by hierarchical temporal modeling and deep feature extraction, though their configurations differ.
SLEEPNET
SLEEPNET employs a two-stage structure:
- CNN Feature Extractor: Processes spectrogram images channel-wise, using three convolutional layers (32, 64, 128 filters; kernel sizes , , ; max-pooling ), batch normalization, ReLU, and dropout (0.5).
- Recurrent Layer: One or two stacked LSTM layers, capturing temporal dependencies across consecutive epochs. The LSTM cell for time-step computes:
where denotes the sigmoid function and the Hadamard product.
- Classification: The final (or time-distributed) hidden state is fed to a softmax classifier for five sleep stage outputs.
SleepEEGNet
SleepEEGNet incorporates a more granular sequence modeling strategy:
- Two-Stream 1D CNN: Each epoch passes through temporal (small filters, high time resolution) and spectral (large filters, high frequency resolution) streams:
- Temporal: Conv1 (64, 50), maxpool (8), Conv2–4 (128, 8), dropout 0.5 throughout.
- Spectral: Conv1 (64, 400), maxpool (8), Conv2–4 (128, 30), dropout 0.5.
The output vectors are concatenated to form a 256-dimensional epoch representation.
- Sequence-to-Sequence with Attention:
- Encoder: Two stacked bidirectional LSTMs (128 per direction).
- Decoder: Unidirectional LSTM (256 hidden units) attends over encoder outputs via:
The attention context and previous label embedding inform the decoder’s prediction. Final outputs are softmax-probabilities per epoch.
3. Training Protocols and Loss Functions
SLEEPNET
- Objective: Multi-class cross-entropy over a mini-batch of epochs:
- Optimization: Adam optimizer (, , , ).
- Regularization: Dropout (0.5) after final pooling and on LSTM outputs.
- Data Split: 70% train, 10% validation, 20% test, subject-level separation. Minority classes are oversampled per batch.
SleepEEGNet
- Objective: Custom class-balanced losses (mean false error, MFE; mean squared false error, MSFE):
with
- Optimization: RMSProp (), regularization (), batch size 20, up to 400 epochs.
- Cross-Validation: 20-fold (EDF-2013), 10-fold (EDF-2018), by subject. Training sets were oversampled using SMOTE.
4. Performance Metrics and Quantitative Results
Both models report comprehensive performance metrics: overall accuracy, Cohen's kappa, per-class precision, recall, F1-score, macro F1, and specificity. Confusion matrices reveal that confusions predominantly occur between N1 and W, and N1 and N2.
| Model | Dataset | Accuracy (%) | MF1 (%) | Kappa | N1 F1 (%) | N2 F1 (%) | REM F1 (%) |
|---|---|---|---|---|---|---|---|
| SLEEPNET | MGH 1K test set | 85.76* | - | 0.7946 | 50† | 91† | 83† |
| SleepEEGNet | EDF2013 Fpz–Cz | 84.26 | 79.66 | 0.79 | 52.19 | 86.77 | 85.02 |
| SleepEEGNet | EDF2018 Fpz–Cz | 80.03 | 73.55 | 0.73 | - | - | - |
*Denotes accuracy reported in abstract; †Recalls/F1s per-class as stated in methods/results.
Both frameworks match expert–expert inter-rater agreement (κ ≈ 0.75–0.80) (Biswal et al., 2017). SleepEEGNet distinctly improves detection of minority N1 stage over DeepSleepNet (F1 52.2% vs 46.6%) (Mousavi et al., 2019).
5. Technical Considerations and Limitations
Both frameworks leverage spatial and temporal characteristics of EEG for sleep staging.
- Strengths: Feature learning is entirely data-driven (no hand-engineered features). CNNs capture frequency and spatial information; LSTMs model contextual dependencies across epochs, improving resistance to isolated misclassification. SleepEEGNet’s sequence-to-sequence LSTM with attention provides interpretability regarding influential input epochs.
- Limitations: N1 remains the least reliably classified due to low prevalence and expert variability. Both SLEEPNET and SleepEEGNet require substantial labeled data for training. SLEEPNET highlights that adding EOG/EMG modalities could enhance REM/N1 discrimination, and proposes semi-supervised or domain-adaptive extensions to reduce annotation dependence. SleepEEGNet’s inference is sensitive to fixed sequence length, and was only validated on single-channel EEG; generalization to multimodal PSG is a proposed future direction.
6. Extensions and Prospective Applications
The methodologies underlying SleepEEGNet are applicable to other sequence-based biomedical classification tasks with imbalanced class distributions, such as arrhythmia detection or epileptic spike identification (Mousavi et al., 2019). Real-time low-compute variants (e.g., 1D CNNs, quantized LSTM) have been proposed for bedside or portable deployment (Biswal et al., 2017). There is ongoing interest in generalizing these architectures to incorporate multimodal physiological signals (EEG, EOG, EMG) and in visualizing attention mechanisms to further elucidate model decision-making processes.
7. Comparative and Contextual Remarks
SleepEEGNet improves upon prior CNN-LSTM systems (notably DeepSleepNet) by explicit sequence-to-sequence modeling and custom loss functions for class balancing (Mousavi et al., 2019). SLEEPNET, trained on the largest PSG cohort reported (MGH, 10,000+ subjects), demonstrates scalability and robust expert-level agreement (Biswal et al., 2017). Both approaches are positioned at the intersection of end-to-end representation learning and clinical diagnostics, aligning with trends in automating labor-intensive annotation tasks using deep neural architectures.