Papers
Topics
Authors
Recent
2000 character limit reached

SleepEEGNet: Automated Sleep Stage Annotation

Updated 8 December 2025
  • SleepEEGNet is a deep learning framework that uses CNN and LSTM architectures to classify 30-second EEG epochs into standard sleep stages.
  • It integrates a two-stream 1D CNN and sequence-to-sequence attention to extract both temporal and spectral features from single-channel EEG signals.
  • The model achieves human-level accuracy and Cohen’s kappa agreement by addressing imbalanced class distributions through custom loss functions and data oversampling.

SleepEEGNet denotes a class of deep learning models tailored for automated sleep stage annotation from EEG, first formalized in the literature as SLEEPNET (Biswal et al., 2017) and subsequently as SleepEEGNet (Mousavi et al., 2019). Both frameworks address the challenge of reliably classifying 30-second epochs of polysomnographic EEG data into standard sleep stages (W, N1, N2, N3, REM) using convolutional and recurrent neural architectures. SLEEPNET exploited spectral-spatial-temporal learning with multi-channel recordings; SleepEEGNet extended these methods to single-channel settings, integrating sequence-to-sequence modeling and class-balanced loss functions to mitigate imbalanced class distributions. These systems achieve human-level accuracy and Cohen’s kappa agreement with expert scorers, supporting scalable sleep diagnostics.

1. Data Acquisition and Preprocessing

Both SLEEPNET and SleepEEGNet process raw polysomnography (PSG) or single-channel EEG signals sampled at either 100 Hz (SleepEEGNet) or 100–200 Hz (SLEEPNET). Standard preprocessing includes band-pass filtering (typically 0.3–35 Hz) and 60 Hz notch filtering to suppress baseline drift and mains-induced noise. The EEG stream is segmented into non-overlapping 30-second epochs for scoring, resulting in vectors of N=fs×30N = f_s \times 30 samples per epoch (e.g., N=200×30=6000N=200 \times 30=6000 for SLEEPNET, N=3000N=3000 for SleepEEGNet).

Epochs are normalized by zero-mean, unit-variance scaling—either per-channel and per-night (SLEEPNET) or directly at the channel level (SleepEEGNet). SLEEPNET generates spectrograms via short-time Fourier transform (STFT) per epoch:

S(f,t)=m=0M1x~[m+t]w[m]ej2πfm/M2S(f, t) = \left| \sum_{m=0}^{M-1} \tilde{x}[m+t]\,w[m]\,e^{-j2\pi f m/M} \right|^2

with window sizes M=256M=256, hop size 50%, and Hamming window w[m]w[m], yielding multi-channel spectrograms (size 128 × 128). SleepEEGNet directly leverages raw time series and spectral features using two 1D convolutional branches.

2. Model Architectures

Both approaches are characterized by hierarchical temporal modeling and deep feature extraction, though their configurations differ.

SLEEPNET

SLEEPNET employs a two-stage structure:

  1. CNN Feature Extractor: Processes spectrogram images channel-wise, using three convolutional layers (32, 64, 128 filters; kernel sizes 5×55\times5, 5×55\times5, 3×33\times3; max-pooling 2×22\times2), batch normalization, ReLU, and dropout (0.5).
  2. Recurrent Layer: One or two stacked LSTM layers, capturing temporal dependencies across consecutive epochs. The LSTM cell for time-step tt computes:

it=σ(Wixt+Uiht1+bi) ft=σ(Wfxt+Ufht1+bf) c~t=tanh(Wcxt+Ucht1+bc) ct=ftct1+itc~t ot=σ(Woxt+Uoht1+bo) ht=ottanh(ct)\begin{aligned} i_t &= \sigma(W_i x_t + U_i h_{t-1} + b_i) \ f_t &= \sigma(W_f x_t + U_f h_{t-1} + b_f) \ \tilde{c}_t &= \tanh(W_c x_t + U_c h_{t-1} + b_c) \ c_t &= f_t \circ c_{t-1} + i_t \circ \tilde{c}_t \ o_t &= \sigma(W_o x_t + U_o h_{t-1} + b_o) \ h_t &= o_t \circ \tanh(c_t) \end{aligned}

where σ\sigma denotes the sigmoid function and \circ the Hadamard product.

  1. Classification: The final (or time-distributed) hidden state is fed to a softmax classifier for five sleep stage outputs.

SleepEEGNet

SleepEEGNet incorporates a more granular sequence modeling strategy:

  1. Two-Stream 1D CNN: Each epoch passes through temporal (small filters, high time resolution) and spectral (large filters, high frequency resolution) streams:
  • Temporal: Conv1 (64, 50), maxpool (8), Conv2–4 (128, 8), dropout 0.5 throughout.
  • Spectral: Conv1 (64, 400), maxpool (8), Conv2–4 (128, 30), dropout 0.5.

The output vectors are concatenated to form a 256-dimensional epoch representation.

  1. Sequence-to-Sequence with Attention:
    • Encoder: Two stacked bidirectional LSTMs (128 per direction).
    • Decoder: Unidirectional LSTM (256 hidden units) attends over encoder outputs via:

αi(t)=exp(tanh(Whht1+Weei))j=1Texp(tanh(Whht1+Weej)),ct=i=1Tαi(t)ei\alpha_i^{(t)} = \frac{\exp\bigl(\tanh(W_h h_{t-1} + W_e e_i)\bigr)}{\sum_{j=1}^T \exp\bigl(\tanh(W_h h_{t-1} + W_e e_j)\bigr)},\quad c_t = \sum_{i=1}^T \alpha_i^{(t)} e_i

The attention context ctc_t and previous label embedding inform the decoder’s prediction. Final outputs are softmax-probabilities per epoch.

3. Training Protocols and Loss Functions

SLEEPNET

  • Objective: Multi-class cross-entropy over a mini-batch of BB epochs:

L=1Bb=1Bi=1Cyb,ilog(pb,i)L = -\frac{1}{B}\sum_{b=1}^B \sum_{i=1}^C y_{b,i}\,\log(p_{b,i})

  • Optimization: Adam optimizer (α=103\alpha=10^{-3}, β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999, ϵ=108\epsilon=10^{-8}).
  • Regularization: Dropout (0.5) after final pooling and on LSTM outputs.
  • Data Split: 70% train, 10% validation, 20% test, subject-level separation. Minority classes are oversampled per batch.

SleepEEGNet

  • Objective: Custom class-balanced losses (mean false error, MFE; mean squared false error, MSFE):

(ci)=1Cij=1Ci(yj(i)y^j(i))2\ell(c_i) = \frac{1}{C_i}\sum_{j=1}^{C_i}\left(y_{j}^{(i)}-\hat{y}_{j}^{(i)}\right)^2

with

LMFE=i=1N(ci),LMSFE=i=1N[(ci)]2\mathcal{L}_{\mathrm{MFE}} = \sum_{i=1}^N \ell(c_i),\qquad \mathcal{L}_{\mathrm{MSFE}} = \sum_{i=1}^N \left[\ell(c_i)\right]^2

  • Optimization: RMSProp (α=104\alpha=10^{-4}), L2L_2 regularization (λ=103\lambda=10^{-3}), batch size 20, up to 400 epochs.
  • Cross-Validation: 20-fold (EDF-2013), 10-fold (EDF-2018), by subject. Training sets were oversampled using SMOTE.

4. Performance Metrics and Quantitative Results

Both models report comprehensive performance metrics: overall accuracy, Cohen's kappa, per-class precision, recall, F1-score, macro F1, and specificity. Confusion matrices reveal that confusions predominantly occur between N1 and W, and N1 and N2.

Model Dataset Accuracy (%) MF1 (%) Kappa N1 F1 (%) N2 F1 (%) REM F1 (%)
SLEEPNET MGH 1K test set 85.76* - 0.7946 50† 91† 83†
SleepEEGNet EDF2013 Fpz–Cz 84.26 79.66 0.79 52.19 86.77 85.02
SleepEEGNet EDF2018 Fpz–Cz 80.03 73.55 0.73 - - -

*Denotes accuracy reported in abstract; †Recalls/F1s per-class as stated in methods/results.

Both frameworks match expert–expert inter-rater agreement (κ ≈ 0.75–0.80) (Biswal et al., 2017). SleepEEGNet distinctly improves detection of minority N1 stage over DeepSleepNet (F1 52.2% vs 46.6%) (Mousavi et al., 2019).

5. Technical Considerations and Limitations

Both frameworks leverage spatial and temporal characteristics of EEG for sleep staging.

  • Strengths: Feature learning is entirely data-driven (no hand-engineered features). CNNs capture frequency and spatial information; LSTMs model contextual dependencies across epochs, improving resistance to isolated misclassification. SleepEEGNet’s sequence-to-sequence LSTM with attention provides interpretability regarding influential input epochs.
  • Limitations: N1 remains the least reliably classified due to low prevalence and expert variability. Both SLEEPNET and SleepEEGNet require substantial labeled data for training. SLEEPNET highlights that adding EOG/EMG modalities could enhance REM/N1 discrimination, and proposes semi-supervised or domain-adaptive extensions to reduce annotation dependence. SleepEEGNet’s inference is sensitive to fixed sequence length, and was only validated on single-channel EEG; generalization to multimodal PSG is a proposed future direction.

6. Extensions and Prospective Applications

The methodologies underlying SleepEEGNet are applicable to other sequence-based biomedical classification tasks with imbalanced class distributions, such as arrhythmia detection or epileptic spike identification (Mousavi et al., 2019). Real-time low-compute variants (e.g., 1D CNNs, quantized LSTM) have been proposed for bedside or portable deployment (Biswal et al., 2017). There is ongoing interest in generalizing these architectures to incorporate multimodal physiological signals (EEG, EOG, EMG) and in visualizing attention mechanisms to further elucidate model decision-making processes.

7. Comparative and Contextual Remarks

SleepEEGNet improves upon prior CNN-LSTM systems (notably DeepSleepNet) by explicit sequence-to-sequence modeling and custom loss functions for class balancing (Mousavi et al., 2019). SLEEPNET, trained on the largest PSG cohort reported (MGH, 10,000+ subjects), demonstrates scalability and robust expert-level agreement (Biswal et al., 2017). Both approaches are positioned at the intersection of end-to-end representation learning and clinical diagnostics, aligning with trends in automating labor-intensive annotation tasks using deep neural architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to SleepEEGNet.