SS-MixNet: Spectral-Spatial Mixer Networks
- SS-MixNet is a spectral-spatial mixer network that fuses local feature extraction with global MLP mixers to enhance hyperspectral and MI-EEG classification.
- It integrates 3D convolutions and depthwise attention to achieve superior per-pixel accuracy and robust cross-subject performance in low-label settings.
- The architecture employs parallel spectral and spatial mixing strategies, yielding efficient computation and additive improvements over baseline models.
SS-MixNet (Spectral-Spatial Mixer Network) refers to a class of neural network architectures that exploit both spectral and spatial features through mixer-based designs. These models, prominent in hyperspectral image (HSI) analysis and motor imagery EEG (MI-EEG) classification, bridge local and long-range dependencies in multi-dimensional data and often employ mixers or multi-layer perceptrons (MLPs), convolutional layers, and lightweight attention mechanisms for efficient and effective representation learning. Notably, two representative but domain-distinct incarnations of SS-MixNet are detailed in the context of HSI classification (Alkhatib, 19 Nov 2025) and MI-EEG classification (Autthasan et al., 6 Sep 2024).
1. Architectural Principles
SS-MixNet architectures typically follow a three-stage pipeline, integrating local feature extraction, spectral-spatial mixing, and refined attention mechanisms.
- Local Spectral-Spatial Feature Extraction: In HSI, local 3D convolutions with kernels produce low-level embeddings that preserve both spectral and spatial locality, capturing instance-specific patterns in the reduced band-patch volume (, e.g., , , ) (Alkhatib, 19 Nov 2025).
- Spectral and Spatial Mixers: Two parallel MLP-style mixer stacks operate along orthogonal axes:
- Spectral Mixers process feature tensors reshaped such that, for each spatial location, spectral relationships are mixed across bands via MLPs, typically stacked with residual connections and nonlinearities (GELU), with depth (e.g., ).
- Spatial Mixers permute the tensor to treat each spectral-channel pair as a token over spatial positions and apply similar MLP mixing.
- Channel-wise Attention: Depthwise convolutional attention is applied on the concatenated outputs of the mixers. Each channel is modulated via a channel-specific convolution and a sigmoid gate, to emphasize informative features with minimal added parameters and computational overhead.
In MI-EEG, SS-MixNet denotes a pipeline combining traditional filter-bank common spatial patterns (FBCSP) for spectral-spatial feature construction, with a modern multi-task learning backbone (MIN2Net), integrating autoencoding, deep metric learning, and supervised classification with an adaptive gradient blending mechanism (Autthasan et al., 6 Sep 2024).
2. Methodological Components
Hyperspectral Image Classification Pipeline
- Preprocessing: Input cubes are reduced in spectral dimension using PCA to (), then partitioned into overlapping cubic patches .
- Network Details:
- Stacked 3D convolutions yield .
- Spectral mixing applies a parallel two-layer MLP to each spatial location across bands, repeated times.
- Spatial mixing processes permutations of across spatial locations, also with stacked MLPs.
- Concatenated feature tensors undergo depthwise channel-wise convolution, channel gating, and global average pooling before a final softmax classifier.
MI-EEG Spectral–Spatial Preprocessing and MixNet
- FBCSP Feature Extraction: MI-EEG trials are band-pass filtered into frequency bands. CSP is solved for each band to create class-conditional spatial filters; the spectral-spatial features are stacks of CSP-filtered time series across all bands, presented as tensors.
- Multi-Task Learning with MIN2Net: The architecture includes a convolutional autoencoder for reconstruction, a deep metric learning head using semi-hard triplet loss, and a supervised classification head. Adaptive gradient blending regulates task-specific loss contributions by tracking generalization-overfitting curves per task.
3. Training Protocols and Datasets
HSI SS-MixNet (Alkhatib, 19 Nov 2025)
- Datasets: Evaluated on QUH-Tangdaowan (18 land-cover classes, 200 bands, PCA-reduced to 15) and QUH-Qingyun (6 urban classes, PCA-reduced to 15).
- Patch Extraction: cubes per pixel.
- Splits: 1% training, 1% validation, 98% testing, stratified per class.
- Optimization: Cross-entropy loss, Adam optimizer ( initial learning rate), early stopping (patience 10). Batch size 64; convergence in typically 70–80 out of 100 epochs.
MI-EEG SS-MixNet (Autthasan et al., 6 Sep 2024)
- Datasets: Six standard MI-EEG datasets (BCIC-2a, BNCI2015, SMR-BCI, High-Gamma, OpenBMI, BCIC-IV-2b), evaluated in subject-dependent (SD) and subject-independent (SI) settings, with both high-density and 3-channel low-density montages.
- Multi-Task Loss: Combined loss weighted adaptively
where are normalized weights updated per-task and per-epoch via gradients of the validation and training loss trajectories.
4. Quantitative Results
HSI Classification
| Dataset | SS-MixNet OA | Best Baseline OA | #Params | FLOPs |
|---|---|---|---|---|
| Tangdaowan | 95.68% | 94.50% (3D-CNN) | 140,914 | 1.93M |
| Qingyun | 93.86% | 93.14% (IP-SWIN) |
- SS-MixNet achieves the highest overall accuracy (OA), average accuracy (AA), and kappa statistics on both benchmarks.
- Per-class performance is optimal in the majority of classes (11/18 Tangdaowan classes).
- The architecture exhibits sharper class boundaries and reduced salt-and-pepper noise in classification maps compared to all tested baselines (2D-CNN, 3D-CNN, IP-SWIN, SimPoolFormer, HybridKAN) (Alkhatib, 19 Nov 2025).
- Computational complexity is minimized (0.77M MACs) while surpassing larger models such as SimPoolFormer (771K parameters, 57.5M FLOPs).
MI-EEG Classification
| Dataset | SD Acc. (MixNet) | SI Acc. (MixNet) |
|---|---|---|
| BCIC-2a | 77.6% ±15.1 | 69.4% ±11.8 |
| BNCI2015 | 80.0% ±13.2 | 66.2% ±11.7 |
| SMR-BCI | 76.3% ±16.5 | 66.1% ±14.0 |
| High-Gamma | 80.2% ±15.3 | 69.8% ±10.8 |
| OpenBMI | 68.9% ±16.8 | 72.0% ±14.2 |
For three-channel EEG (BCIC-IV-2b), MixNet achieves 77.1% (SD) and 75.7% (SI) accuracy, outperforming all reference models by 2–10% average (Autthasan et al., 6 Sep 2024).
5. Component Contribution and Ablation Findings
- HSI Ablation:
- Adding the spectral mixer to the 3D-CNN baseline raises OA from 94.20% to 95.07%.
- Adding only the spatial mixer provides 94.89% OA.
- Combining both mixers achieves 95.38% OA.
- The full model, including the channel-wise depthwise attention, attains the final OA of 95.68% (Alkhatib, 19 Nov 2025).
- This indicates strictly additive benefits of each module.
- MI-EEG Ablation:
- The FBCSP spectral-spatial preprocessing is essential for encoding discriminative patterns.
- The MIN2Net multi-task module, with adaptive weighting, provides robust generalization across dense and sparse EEG montages.
6. Implementation and Reproducibility
Reference code for HSI SS-MixNet is scheduled for public release at https://github.com/mqalkhatib/SS-MixNet and is structured for exact reproducibility (fixed seeds, comprehensive data and model scripts) (Alkhatib, 19 Nov 2025).
- Main modules:
data_loader.py(augmentation, PCA, extraction, splitting),models/ss_mixnet.py(architecture),train.py(training, scheduling),evaluate.py(metrics and visualization). - Dependencies include TensorFlow 2.10.0, numpy, and scikit-learn.
- Reproduction command:
1
python train.py --dataset=Tangdaowan --epochs=100 --batch_size=64
For MI-EEG MixNet, the architecture, loss functions, and adaptive blending algorithm are fully specified, and evaluation benchmarks and ablation settings are comprehensively reported, enabling replication and further extension (Autthasan et al., 6 Sep 2024).
7. Context and Significance
SS-MixNet architectures advance the state of the art by integrating local, high-dimensional feature extraction with global mixer-based modeling, while maintaining computational tractability. In HSI, this yields robust per-pixel classification under extremely low supervision (1% labeled data). In MI-EEG, the combination of classical spectral–spatial methods with multi-task deep learning and adaptive loss blending enhances cross-subject generalization and supports low-density EEG settings. These methodologies demonstrate strict improvements over baselines using either classical, CNN, or transformer-based alternatives. A plausible implication is that spectral-spatial mixer paradigms, as exemplified by SS-MixNet, can serve as a blueprint for efficient, high-accuracy modeling in other domains characterized by multimodal or multi-axis structure.
References:
- (Alkhatib, 19 Nov 2025) Hyperspectral Image Classification using Spectral-Spatial Mixer Network
- (Autthasan et al., 6 Sep 2024) MixNet: Joining Force of Classical and Modern Approaches Toward the Comprehensive Pipeline in Motor Imagery EEG Classification