Hybrid CNN-GRU-mRMR Model for EEG Depression Detection
- The paper presents a hybrid model that combines CNN for spatial and GRU for temporal feature extraction with mRMR for optimal feature selection in EEG-based depression detection.
- The CNN branch extracts 20 spatial features and the GRU branch captures 100 temporal features, which are fused and reduced to 30 key features for robust classification.
- Performance metrics show high sensitivity (97.9%), perfect specificity (100%), and an overall accuracy of 98.42%, highlighting its clinical relevance as an objective biomarker.
A hybrid CNN-GRU-MRMR model refers to a deep learning framework that jointly exploits convolutional neural networks (CNNs) and gated recurrent units (GRUs) for feature extraction, followed by minimum redundancy–maximum relevance (mRMR) feature selection, and utilizes a fully connected neural network for classification. In the context of clinical neurophysiology, specifically for electroencephalographic (EEG) depression detection, this architecture integrates spatial and temporal characteristics of multi-channel EEG signals, then distills them to the most relevant features for robust and compact downstream classification (Yousefi et al., 16 Jan 2026).
1. Model Architecture and Workflow
The CNN-GRU-MRMR model is structured as a sequential multi-branch architecture:
- Spatial Feature Extraction: A CNN branch processes the input EEG segment (3 channels × T samples) to extract 20 spatial features. It operates channel-wise, applying learned spatial filters to identify activation patterns invariant to temporal shifts.
- Temporal Feature Extraction: In parallel, a GRU branch treats the EEG sample as a multivariate time series to extract 100 temporal features, modeling short- and long-term dependencies via gated recurrent computations.
- Feature Fusion and Selection: The concatenated 120-dimensional feature vector (20 CNN + 100 GRU features) undergoes dimensionality reduction using the mRMR algorithm, which optimizes feature set relevance to the supervised label (depressed vs. healthy) while minimizing redundancy among selected features, resulting in a 30-dimensional feature vector.
- Classification: The reduced features power a fully connected network (multiple hidden layers with ReLU activations), concluded with a 2-unit Softmax output for binary classification.
A schematic textual description:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Input EEG segment (3×T)
│
┌──────┴───────┐
│ │
CNN branch GRU branch
(20 spatial) (100 temporal)
│ │
└──────┬───────┘
120-D concat
│
mRMR selection
(30 features)
│
Fully connected net
│
Softmax (2 classes) |
2. Component Modules: CNN and GRU Feature Extraction
CNN Branch
The CNN branch is designed to capture spatial correlations within the multi-channel EEG data. The generic convolutional block employed is:
Max-pooling follows:
The CNN generates a fixed-length 20-dimensional feature vector. The implementation specifics (e.g., number of convolutional layers, filters, kernel sizes) are not supplied; a reasonable replication may use three convolutional layers (32, 64, 128 filters) with kernel size , max-pooling, and flattening into the final feature vector.
GRU Branch
The GRU module models temporal dependencies in the EEG sequence, producing a 100-dimensional output vector. The update equations for a single-layer GRU with units are:
- Update gate:
- Reset gate:
- Candidate hidden state:
- Hidden state:
Temporal feature learning is stabilized rapidly in training, with observed loss reductions from approximately 0.4 to 0.1.
3. Feature Dimensionality Reduction via mRMR
Post-fusion, the combined feature vector is subjected to mRMR feature selection, a filter-based method optimizing for maximal relevance to the depression state label while minimizing inter-feature redundancy. The selection criterion is:
where and , with denoting mutual information. The algorithm outputs the 30 most informative and non-redundant features from the original 120.
4. Data Processing and Training Paradigm
Dataset and Preprocessing
- Dataset: MODMA dataset, consisting of 53 subjects (24 MDD, 29 controls), 3-channel wearable EEG, ages 16–52, under resting-state and mild stimulation.
- Preprocessing: Multi-stage, including artifact removal, outlier correction, band-pass filtering (4.5–45 Hz), and normalization (zero mean, unit variance per channel).
- Segmentation: Each recording is divided into 10 nonoverlapping epochs, yielding 530 segments (290 healthy, 240 depressed).
Training Protocol
- Loss function: Cross-entropy.
- Optimizer and batch size: Not reported. Adam (lr=1e-3) and batch size 16–32 are common defaults.
- Regularization: Dropout and weight decay not mentioned. Dropout after FC layers may aid generalization.
- Split: 70% train (371), 30% test (159).
- Epochs: CNN convergence at ca. 2,000 iterations; FC classifier at ~90 iterations.
5. Performance Analysis and Benchmarking
The proposed model demonstrates superior classification accuracy and discriminative capability relative to contemporary baselines.
Test/Benchmark Results
| Method | Accuracy (%) |
|---|---|
| CNN + GRU [43] | 89.63 |
| CNN only [44] | 91.01 |
| ResNet-50 + LSTM [45] | 90.02 |
| CNN only (Ksibi et al. [46]) | 97.00 |
| CNN–GRU–mRMR–Dense (proposed) | 98.42 |
Key metrics on the test set of 159 segments:
- Sensitivity: 97.9% (94/96, depressed correctly identified)
- Specificity: 100% (63/63, healthy correctly identified)
- Precision: 97.92%
- Recall: 100%
- F1-score: 98.95%
- ROC-AUC: 0.9846
These findings underscore the robust detection capacity of spatial-temporal feature integration with optimized selection.
6. Clinical Relevance and Research Directions
The model's high sensitivity is advantageous for screening, minimizing undetected cases in populations at risk for depression, while perfect specificity precludes healthy subjects from being erroneously flagged. As an objective EEG-based biomarker, this approach addresses the subjectivity and potential unreliability of self-report assessments, providing complementary evidence for clinical interviews.
Potential applications include:
- Augmenting existing diagnostic procedures for depressive disorders,
- Integration with neurostimulation therapies (tDCS, TMS) for real-time monitoring and treatment personalization,
- Deployment as a screening or neurofeedback tool.
Limitations include the use of a small, low-density (3-channel) EEG sample set, potentially restricting generalizability. The model would benefit from external validation on higher-density, larger-scale EEG datasets and under real-world clinical conditions. Future explorations may involve substituting the GRU branch with architectures such as bi-directional GRUs or Transformers, and advancing regularization techniques. Real-time and mobile platform implementation also represents a salient direction.
7. Summary
The hybrid CNN-GRU-mRMR framework for EEG-based depression detection leverages spatial and temporal deep encoding, sophisticated feature selection, and dense neural classification to achieve state-of-the-art accuracy (>98%) on the MODMA benchmark. It represents a technically rigorous, data-driven alternative to subjective clinical scales, with potential for impactful clinical translation and adaptation to broader neural decoding tasks (Yousefi et al., 16 Jan 2026).