Imagined Motor Imagery Classification

Updated 6 May 2026

Imagined Motor Imagery Classification is the process of decoding non-invasively recorded EEG signals to determine imagined limb or body-part movements, foundational for BCIs.
Experimental paradigms use tasks like finger-writing and multi-class motor imagery, leveraging advanced signal processing (e.g., CSP, CNNs) to boost user engagement and accuracy.
Recent deep learning and hybrid architectures achieve up to 92.7% accuracy, addressing challenges in cross-subject transfer and asynchronous classification for practical BCI deployment.

Imagined motor imagery classification refers to the automated decoding of a subject’s intended limb or body-part movement, performed purely in imagination and recorded non-invasively using electroencephalography (EEG) or related signals. This technology is foundational for brain-computer interfaces (BCIs), with applications in neural prosthetics, rehabilitation, and assistive control systems. State-of-the-art research leverages advanced feature extraction, spatial–temporal modeling, and deep learning to classify complex, often multi-class, imagined movements—including both elementary and combinatorially generated paradigms.

1. Experimental Paradigms and Task Design

Motor imagery EEG classification begins with protocol design tailored to maximize both subject comfort and neurophysiological separability of tasks. Traditional approaches typically employ cue-based left-vs-right hand imagery using abstract markers (arrows), but optimized paradigms include culturally or proprioceptively familiar tasks such as finger-writing Chinese characters or simulating daily upper limb actions.

For example, a writing-based motor imagery protocol guided naive subjects to repeatedly imagine the hand-writing of a five-stroke Chinese character, cued by a composite image of the hand plus overlaid character. This paradigm, compared to a traditional arrow-cue paradigm, showed a substantial gain in classification accuracy (79.8% vs. 65.1%, Δ = +14.7%) and improved task comfort and engagement, as rated by users (Qiu et al., 2016). Other studies employ multi-way paradigms, ranging from three to nine classes: e.g., single-arm reach in six directions, hand grasp, wrist twist, and rest, or seven-class upper-limb datasets including hand open/close, elbow/forearm flexion/extension, pronation/supination, and rest (Lee et al., 2020, Khan et al., 2022). The combinatorial superposition of four canonical imageries (left hand, right hand, feet, tongue) enables up to ten distinct commands via pairwise linear mixing (Korkan et al., 2021).

Beyond motor imagery alone, some protocols interleave sense-related (e.g., imagining touching hot/cold objects) and motor-related (pull/push) conditions, finding distinct spatial activations and moderate MI classification performance (≈55%–53% for best conditions, DeepConvNet) (Kim et al., 2024).

2. Signal Acquisition, Preprocessing, and Feature Construction

High-quality EEG acquisition in MI-BCI employs 20–64 channel caps (often 10–20 or 10–10 montage) at 250–1000 Hz, bandpass filtered (typ. 8–30 Hz for SMR; 0.1–60 Hz or 0.5–35 Hz for broader paradigms) with artifact rejection via ICA, ASR, or stringent subject/epoch exclusion (Sanalitro et al., 29 Aug 2025, Dip et al., 4 Apr 2025).

Spatial referencing (e.g., CAR), precise epoch extraction (e.g., 3–4 s around imagery cue), and per-epoch normalization precede feature extraction. The classical approach uses Common Spatial Pattern (CSP) to learn spatial filters that maximize variance for each MI class. CSP features are log-variances of projected signals over the discriminative spatial filters, followed by linear SVM or LDA for classification. Frequency-domain emphasis, notably mu/β power, is captured either by explicit band-power and PSD computations or by convolutional layers set to approximate bandpass characteristics (Kwon et al., 2020, Lee et al., 2020).

Recent pipelines commonly compute per-channel energy and instantaneous spectral entropy, derive per-band powers (Welch’s or wavelet-based), principal components from time-frequency transforms, or statistical/population metrics (mean, variance, RMS, skewness, kurtosis). Feature selection strategies include mutual information filtering and sequential floating forward selection (SFFS), with SVM or DNN informing the selection criterion (Dip et al., 4 Apr 2025).

3. Machine Learning and Deep Neural Classification Methods

3.1 Classic and Shallow Models

Initial MI-BCI systems relied on CSP(filtered)-LDA/SVM pipelines (Qiu et al., 2016, Kwon et al., 2020), but these are limited for multi-class problems, cross-subject generalization, or asymmetric motor imagery paradigms. Feature-based SVMs using energy and entropy per channel achieve near-perfect (99.5%) within-subject accuracies, albeit with potential risk of overfitting and limited cross-user transfer (Sanalitro et al., 29 Aug 2025).

3.2 Deep Convolutional Architectures

End-to-end convolutional neural networks (CNNs), such as ShallowConvNet, DeepConvNet, and EEGNet, directly learn spatial and temporal filters from raw or lightly preprocessed EEG. A two-block “band-power feature refining” CNN (BFR-CNN) was shown to outperform both DeepConvNet and EEGNet for four-class, single-arm imagined movements (accuracy 0.84 ± 0.04) (Lee et al., 2020). Hierarchical CNN architectures that segregate coarse classes (e.g., arm vs. hand MI) before fine-grained sub-branches yield robust performance in high-dimensional, intuitive MI paradigms: e.g., 0.82–0.88 accuracy for 3-/5-class, and 0.63–0.66 for 7-class, outperforming all baselines (Lee et al., 2020).

Deep fusion models integrate spatial and temporal encoding. A representative example is 3D-CLMI, which inputs spatially stacked, windowed EEG to a multiscale 3D-CNN, extracts temporal features via LSTM, and jointly weighs time steps with an attention mechanism. This architecture attains 92.7% mean accuracy and F1 = 0.91 on BCI IV-2a, with advantages attributed to 3D spatial kernel fusion and attention over the LSTM output (Cheng et al., 2023). Other pipelines use CNN-LSTM hybrids or pre-trained VGG-16/19 acting on STFT spectrograms, achieving 87.07% for seven-class, upper limb MI decoding—a substantial improvement over CSP-LDA and shallow deep nets (Khan et al., 2022, Hwaidi et al., 22 Aug 2025).

3.3 Modern Time–Frequency and Geometric Models

Approaches exploiting Riemannian geometry and cross-frequency coupling use tangent-space mapping of trial covariance matrices (RTS), extracted over dichotomous filter banks (DFB), then classified with lightweight CNNs and center loss. On BCI IV-2a, DFBRTS achieves 78.16% (4-class), outperforming FBCNet and traditional bandpower models (Xiong et al., 2023). Similarly, MiniRocket applies 10k fixed convolutional kernels with proportion-of-positive (PPV) pooling, followed by ridge regression, delivering 98.63% accuracy at millisecond-scale inference speed (Hwaidi et al., 22 Aug 2025).

Graph-based approaches, e.g., MutualGraphNet, use per-electrode mutual information to define adjacency for spatial–temporal graph convolution networks (ST-GCN). These architectures exploit pairwise dependencies for subject- and class-specific spatial filtering, surpassing CNN and FBCSP baselines (accuracy 0.5190, F1 0.5175 on 4-class SMR) (Li et al., 2021).

3.4 Transformer, State Space, and Hybrid Architectures

Transformer-based and state-space models capture long-range EEG dependencies and context. EEGEncoder fuses TCN and stable Transformer blocks in a parallel-ensemble (five-branch) structure, integrating causal temporal and contextual spatial information, achieving mean 86.5% accuracy and κ ≈ 83.3% on BCI IV-2a (Liao et al., 2024). Spatiotemporal MambaNet employs selective SSMs (Mamba) in parallel for spatial and temporal tokenization, fusing the outputs via convolutional heads for efficiency and scalability. STMambaNet consistently exceeds 82% (4-class) and 89% (2-class) on BCI IV datasets (Yang et al., 2024).

Efficient pooling of spatiotemporal patterns is also achieved via topological map generation (t-SNE-based), InternImage deformable large-kernel CNNs, and PoolFormer-inspired 2D average pooling over time–feature frames; this yields significant gains in multi-class cross-subject accuracy (up to 88.57% for 2-class, 70.17% for 4-class MI) (Fukushima et al., 2024).

4. Multi-class Expansion, Combined MI, and Command Scaling

A central challenge is extending from binary or four-class MI to richer command sets without degrading separability. Korhan et al. demonstrated that pairwise superposition of four elementary MI signals (LH, RH, F, T) generates up to ten commands. Artificially combined MI signals—produced by linear averaging of two simple MI epochs—can be classified with state-of-the-art small CNNs using minimum-distance Walsh coding (DivFE), yielding 77.8% (10-class, four base + six combined) on BCI IV-2a and 76.5% on real-time acquired data. The neurophysiological validity of this combinatorial expansion is supported by ERD/ERS analyses that show the expected linear superposition in C3/C4 (Korkan et al., 2021).

Likewise, frameworks capable of >5-class single-arm or upper-limb MI have been reported, achieving between 63–88% average accuracy across paradigms, with role-assigned architectures promoting more discriminative representation (Lee et al., 2020, Khan et al., 2022).

5. Robustness, Cross-Subject Transfer, and Asynchronous MI

Person-specific modeling with per-subject feature sets and classifiers, especially when using small channel counts over motor areas and robust artifact rejection, can yield up to 99.5% accuracy for within-subject, three-class problems (Sanalitro et al., 29 Aug 2025). However, cross-subject transfer remains challenging due to variability in SMR topography, cognitive strategies, and SNR. Methods such as leave-one-subject-out validation, subject-independent feature selection, or LOSO-CV consistently show reduced accuracy relative to within-subject approaches, with typical drop-offs of 6–20% depending on class and pipeline (Xiong et al., 2023, Dip et al., 4 Apr 2025).

For real-world or online deployment, asynchronous BCIs must both detect MI onset and classify the MI segment in continuous EEG. The SWPC pipeline addresses this by combining sliding-window prescreening (rest vs. MI) and MI classification modules, both trained with supervised and self-supervised contrastive learning. Across multiple BNCI datasets, SWPC attains ≈2% improvement over the best baselines for both within- and cross-subject validation, affirming that modular, self-supervised feature shaping can substantially harden MI-BCI to spontaneous timing and transition-state noise (Wu et al., 2024).

6. Interpretability, Neurophysiological Validation, and Future Directions

Several studies report explicit mapping of discriminative features to neuroperioperative correlates. PSD, band-power maps, and CSP patterns are consistently localized to contralateral C3/C4 regions for hand MI, with topographic shifts (posterior vs anterior sensorimotor cortex) distinguishing sensory (hot/cold) and motor (push/pull) MI conditions (Kim et al., 2024, Dip et al., 4 Apr 2025). Channel saliency and feature selection analyses reinforce the classic homuncular localization for imagined hand, feet, and tongue tasks.

Ongoing trends in imagined MI classification include:

Deep multimodal fusion (CNN–LSTM, 3D-CNN, transformer-temporal hybrids)
Data-driven feature and connectivity discovery (mutual information, tangent-space mapping, graph adjacency)
Efficient, scalable pooling and state-space modeling (PoolFormer, Mamba)
Fine-tuned data augmentation, transfer learning, and self-supervised contrastive pretraining
Expanding class sets by combinatorial mixing and neurophysiologically valid synthesized commands

Challenges remain in achieving robust subject-independent classification, developing reliable asynchronous control for BCIs, and exploring the limits of class proliferation without loss of accuracy. Methodological advances in spatial/temporal adaptivity, continual learning, and neurophysiological validation are likely to further increase the practical impact of imagined motor imagery classification in rehabilitation, neural prosthetics, and adaptive assistive devices.

Selected Performance Table for Multi-Class Imagined MI Classification

Method/Model	MI Task Classes	Accuracy (mean, %)	Dataset / Context
Writing-based CSP+SVM (Qiu et al., 2016)	2 (L/R hand)	79.8	Inexperienced users, optimized paradigm
ERA-CNN (Lee et al., 2020)	3/5/7 (arm/hand)	88/82/63–66	Single-arm high-dim. MI, subject-specific
VGG-16 Spectrogram (Khan et al., 2022)	7 upper-limb	87.1	Per-subject tuning, public dataset
3D-CLMI (CNN+LSTM+attention) (Cheng et al., 2023)	4	92.7, F1=0.91	BCI IV-2a, cross-subject CV
DFBRTS (tangent-space CNN) (Xiong et al., 2023)	4	78.16	BCI IV-2a, cross-session
MiniRocket+Ridge (Hwaidi et al., 22 Aug 2025)	4	98.63	PhysioNet MI, 10 subjects
STMambaNet (spatiotemporal SSM) (Yang et al., 2024)	4	82.37	BCI IV-2a, cross-subject
Writing task (CSP+LDA) (Qiu et al., 2016)	2	65.1	Arrow-cue paradigm (baseline)

References to all cited works appear per arXiv paper id as required for further details regarding task structure, methodologies, and performance outcomes.