EEG-Based Emotion Recognition
- EEG-based emotion recognition is the automated inference of affective states from neural oscillations using advanced signal processing and deep learning techniques.
- It employs preprocessing, feature extraction, and multi-scale deep architectures, including CNNs, RNNs, and graph networks, to capture spatial, spectral, and temporal dependencies.
- Recent methodologies address domain shift with adversarial training and contrastive learning, achieving high cross-subject and real-time performance on benchmark datasets.
Electroencephalogram (EEG)-based emotion recognition refers to the automated inference of affective states from EEG signals, leveraging the brain’s oscillatory and spatial dynamics as correlates of emotion processing. This rapidly advancing field combines signal processing, neuroscience, and machine learning—especially deep learning—to map transient, high-dimensional neural recordings onto discrete or continuous emotion representations for applications in affective computing, brain–computer interfaces, and neuropsychiatric assessment.
1. Physiological and Computational Foundations
EEG captures noninvasive electrophysiological activity via multi-channel scalp electrodes, primarily reflecting postsynaptic potentials arising from cortical pyramidal cell populations. Emotional states modulate oscillatory power, phase synchrony, and functional organization in specific frequency bands and scalp locations. Classic findings include frontal alpha asymmetry (higher left-vs-right prefrontal alpha associated with positive valence), beta/gamma augmentations in arousal, and distinctive connectivity patterns during affect induction (Li et al., 2022).
Emotion is operationalized using either discrete categories (e.g., happy, sad, fear) or continuous dimensions (typically the Russell Valence–Arousal–Dominance (VAD) framework). Neural correlates span frontal, temporal, and parietal regions, and exhibit individual, session, and context-dependent variability. Consequently, accurate emotion recognition from EEG demands modeling rich spatial, spectral, and temporal dependencies, as well as domain adaptation to new subjects and environments (Zhang et al., 27 Jan 2025, Shen et al., 2024).
2. Signal Preprocessing and Feature Extraction
Preprocessing pipelines standardize EEG recordings through artifact removal (using ICA or regression for ocular/muscular contamination), bandpass filtering (typically within 0.5–45 Hz), referencing (e.g., common average), and normalization (z-scoring). EEG signals are segmented into overlapping windows (from 0.5 s to several seconds) to capture temporally resolved features (Rehman et al., 28 Aug 2025, Chandanwala et al., 2024).
Feature extraction strategies include:
- Time-domain: statistical moments (mean, variance, skewness, kurtosis), Hjorth parameters, zero-crossings.
- Frequency-domain: power spectral density (PSD) and band power features in delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (12–30 Hz), and gamma (30–45 Hz) bands via Welch’s method, DWT, or STFT (Chandanwala et al., 2024, Bazgir et al., 2019).
- Time–frequency representations: STFT, continuous/discrete wavelet transforms yield joint temporal and spectral energy matrices.
- Spatial graph features: adjacency matrices based on sensor locations or statistical dependencies (correlation, phase-locking value, coherence), supporting graph-theoretic constructs (Zhong et al., 2019, Moon et al., 2018).
- Connectivity: functional connectivity matrices (PCC, PLV, PLI) encode inter-channel synchrony, essential for emotion classification (Moon et al., 2018).
Beyond handcrafted features, recent approaches employ data-driven learning of spatial, spectral, and temporal hierarchies from minimally processed or raw time series (Zhou et al., 2024, Wu et al., 2021).
3. Machine Learning and Deep Architectures
EEG-based emotion recognition has progressed from classical machine learning (SVMs, random forests, ANN), operating on fixed feature vectors (Bazgir et al., 2019, Lakhan et al., 2018), to advanced deep learning systems that integrate spatiotemporal modeling, attention, and graph structure.
Typical architectures include:
- CNNs (2D/3D): extract spatial and spectral patterns from topographic or connectivity images, with recent emphasis on multi-scale kernels and hemispheric asymmetry modules (e.g., MSBAM, SFE-Net) (Wu et al., 2021, Deng et al., 2021).
- RNNs (LSTM, GRU): model temporal dependencies in sequences of feature vectors or raw signals, sometimes in hybrid CNN–RNN frameworks (Rehman et al., 28 Aug 2025).
- GCNs/Graph Attention Networks (GAT): operate on graphs encoding electrode topology or learned adjacency, enabling local/global interaction modeling (Zhang et al., 27 Jan 2025, Zhong et al., 2019).
- Attention and Transformer models: self/multi-head attention for adaptive feature selection, with strong results for cross-domain generalization (Yan et al., 7 Nov 2025, Dolgopolyi et al., 19 Nov 2025).
- Hybrid quantum-classical methods: quantum circuits process bandpower-encoded vectors, with classical networks fusing quantum and conventional features (Chandanwala et al., 2024).
- AutoML and NAS-inspired models: search for optimal architectures to balance accuracy, interpretability, and computational cost (Li et al., 2022).
State-of-the-art models fuse spatiotemporal features at multiple granularities—global brain state, regional modules, and fine-scale channel interactions—often with attention or vector quantization mechanisms to promote discriminative, generalizable representations (Zhang et al., 27 Jan 2025, Shen et al., 2024, Zhu et al., 2023).
4. Domain Adaptation, Cross-Subject, and Cross-Dataset Transfer
A major obstacle is “domain shift” caused by inter-subject, inter-session, and cross-dataset variability. Addressing this, modern pipelines deploy:
- Adversarial training: domain discriminators operate at global or node levels to enforce subject-invariant features (Zhong et al., 2019, Li et al., 2019).
- Contrastive learning: pairing EEG segments with text prompts (CLIP framework) or across subjects/sessions for contrastive embedding alignment, promoting generalization (Yan et al., 7 Nov 2025, Shen et al., 2024).
- Style transfer networks: explicitly disentangle content (task-relevant emotion features) and style (dataset/domain statistics), reconstructing transferable EEG feature spaces via cross-domain transformer decoders and multi-objective loss (Zhou et al., 2023).
- Vector quantization / codebook approaches: restrict network representations to discrete prototypes (as in MIND-EEG), preventing over-smoothing and enhancing class separability (Zhang et al., 27 Jan 2025).
Empirical results on SEED, SEED-IV, MPED, DEAP, DREAMER, and FACED datasets confirm that such schemes markedly improve cross-domain and cross-subject/test-time performance, typically narrowing the accuracy gap to within 5–10% of subject-dependent results.
5. Model Performance and Benchmark Datasets
Standard public EEG emotion corpora include:
| Dataset | Subjects | Channels | Labels | Typical Protocols | Reported SOTA Acc. |
|---|---|---|---|---|---|
| SEED | 15 | 62 | 3-class (P/N/Neu) | Subject/Cross-Subject | 94.24% (RGNN), 88.69% (EmotionCLIP) |
| SEED-IV | 15 | 62 | 4-class (Happy/Sad/Fear/Neu) | Subject/Cross-Subject | 79.37% (RGNN), 74.79% (MIND-EEG) |
| MPED | 30 | 62 | 7-class | Cross-dataset | 42.08% (MIND-EEG), 45.6% (E²STN) |
| DEAP | 32 | 32 | Valence/Arousal (1–9) | Subject/Cross-Subject | 99.72% (CNN-PLV) (subject-dep), 0.884 (GCN, arousal) |
| DREAMER | 23 | 14 | V/A/D (1–5) | Subject/Cross-Subject | 94.94% (MS-iMamba, 4 ch., intra-subj.) |
Recent advances show test-set or subject-independent accuracies in the 85–95% range on 3–5 class problems, with subject-dependent performance slightly higher (up to 99% in binary V/A on DEAP with optimal connectivity+CNN pipelines) (Zhang et al., 27 Jan 2025, Shen et al., 2024, Zhou et al., 2024, Wu et al., 2021, Moon et al., 2018).
6. Architectural Innovations and Neuroscience Alignment
Multiple works leverage neuroscientific insights to inform the design of network modules:
- Hemispheric asymmetry: BiHDM, MSBAM, and SFE-Net embed left–right differences by explicit pairwise subtraction, spatial folding, or hemisphere-aware kernels, reflecting known lateralization of affective processing (Li et al., 2019, Wu et al., 2021, Deng et al., 2021).
- Hierarchical spatial modeling: MIND-EEG deploys multi-granularity graph encoders (global, intra-regional, inter-regional) to mirror macroscopic brain organization (Zhang et al., 27 Jan 2025).
- Functional connectivity: CNNs trained on PLV matrices with intentional electrode ordering can exploit inter-regional synchrony and asymmetric patterns, reaching near-perfect binary accuracy (Moon et al., 2018).
- Multi-scale temporal kernels: Diverse 1D/2D convolutions or temporal windows mitigate non-stationarity and enable detection of both transient and sustained affect signatures (Zhou et al., 2024, Wu et al., 2021, Ly et al., 19 Jun 2025).
- Attention and interpretability: Channel- and spatial-attention modules highlight salient sensors and time–frequency patches, making network decisions more explainable and neurobiologically plausible (Yutian et al., 2024, Shen et al., 2024).
7. Practical Considerations, Challenges, and Future Directions
Critical challenges remain, including:
- Inter-subject variability: Cross-subject calibration is limited by the diversity of neural patterns; emerging solutions include domain adaptation, contrastive learning, and subject-adaptive fine-tuning (Yan et al., 7 Nov 2025, Shen et al., 2024, Zhou et al., 2023).
- Dataset and label constraints: Small training cohorts and subjective labels (often from single post-trial ratings) introduce noise and restrict generalizability (Rehman et al., 28 Aug 2025).
- Hardware and deployment: End-to-end, lightweight, and compressed models (e.g., binarized CNNs, reduced-electrode pipelines) enable real-time emotion recognition on consumer-grade and portable EEG devices, facilitating at-home or embedded applications (Ly et al., 19 Jun 2025, Dolgopolyi et al., 19 Nov 2025, Qiao et al., 2022).
- Explainability: Advanced architectures require explicit justification of learned representations (e.g., attention maps, codebook visualization), relating model behavior to neurophysiology (Zhang et al., 27 Jan 2025, Shen et al., 2024, Yutian et al., 2024).
- Modal fusion and extension: Incorporation of multimodal signals (ECG, eye-tracking, facial video), online adaptation, and transfer learning are major directions for extending robustness and real-world impact (Rehman et al., 28 Aug 2025, Li et al., 2022).
Technical future work suggests integrating dynamic-graph/time-resolved modules, extending vector-quantized/GAN architectures, enhancing multimodal and cross-lingual transfer, and developing subject-agnostic pretraining with large-scale, federated EEG corpora (Zhang et al., 27 Jan 2025, Shen et al., 2024, Rehman et al., 28 Aug 2025).
In summary, EEG-based emotion recognition has undergone a paradigm shift from shallow classifiers on handcrafted features to highly structured, interpretable, and domain-adaptive deep learning architectures. Incorporation of neuroscientific priors, multi-granularity modeling, graph-based analysis, and robust cross-domain generalization has yielded high-accuracy, near real-time systems suitable for in-the-wild affective sensing (Zhang et al., 27 Jan 2025, Rehman et al., 28 Aug 2025, Shen et al., 2024, Yan et al., 7 Nov 2025, Chandanwala et al., 2024, Zhou et al., 2023, Moon et al., 2018). Continued advancement will likely depend on the joint progress of network innovation, neurobiological validation, and expansion of large, diverse, and well-annotated EEG datasets.