Papers
Topics
Authors
Recent
2000 character limit reached

SS-MixNet: Spectral-Spatial Mixer Networks

Updated 26 November 2025
  • SS-MixNet is a spectral-spatial mixer network that fuses local feature extraction with global MLP mixers to enhance hyperspectral and MI-EEG classification.
  • It integrates 3D convolutions and depthwise attention to achieve superior per-pixel accuracy and robust cross-subject performance in low-label settings.
  • The architecture employs parallel spectral and spatial mixing strategies, yielding efficient computation and additive improvements over baseline models.

SS-MixNet (Spectral-Spatial Mixer Network) refers to a class of neural network architectures that exploit both spectral and spatial features through mixer-based designs. These models, prominent in hyperspectral image (HSI) analysis and motor imagery EEG (MI-EEG) classification, bridge local and long-range dependencies in multi-dimensional data and often employ mixers or multi-layer perceptrons (MLPs), convolutional layers, and lightweight attention mechanisms for efficient and effective representation learning. Notably, two representative but domain-distinct incarnations of SS-MixNet are detailed in the context of HSI classification (Alkhatib, 19 Nov 2025) and MI-EEG classification (Autthasan et al., 6 Sep 2024).

1. Architectural Principles

SS-MixNet architectures typically follow a three-stage pipeline, integrating local feature extraction, spectral-spatial mixing, and refined attention mechanisms.

  • Local Spectral-Spatial Feature Extraction: In HSI, local 3D convolutions with 3×3×33\times3\times3 kernels produce low-level embeddings that preserve both spectral and spatial locality, capturing instance-specific patterns in the reduced band-patch volume (F∈RM×M×P×D\mathbf{F} \in \mathbb{R}^{M \times M \times P \times D}, e.g., M=9M=9, P=15P=15, D=32D=32) (Alkhatib, 19 Nov 2025).
  • Spectral and Spatial Mixers: Two parallel MLP-style mixer stacks operate along orthogonal axes:
    • Spectral Mixers process feature tensors reshaped such that, for each spatial location, spectral relationships are mixed across bands via MLPs, typically stacked with residual connections and nonlinearities (GELU), with depth LL (e.g., L=4L=4).
    • Spatial Mixers permute the tensor to treat each spectral-channel pair as a token over spatial positions and apply similar MLP mixing.
  • Channel-wise Attention: Depthwise convolutional attention is applied on the concatenated outputs of the mixers. Each channel is modulated via a channel-specific convolution and a sigmoid gate, to emphasize informative features with minimal added parameters and computational overhead.

In MI-EEG, SS-MixNet denotes a pipeline combining traditional filter-bank common spatial patterns (FBCSP) for spectral-spatial feature construction, with a modern multi-task learning backbone (MIN2Net), integrating autoencoding, deep metric learning, and supervised classification with an adaptive gradient blending mechanism (Autthasan et al., 6 Sep 2024).

2. Methodological Components

Hyperspectral Image Classification Pipeline

  • Preprocessing: Input cubes Iorig∈RH×W×C\mathbf{I}_{\text{orig}} \in \mathbb{R}^{H \times W \times C} are reduced in spectral dimension using PCA to Ired∈RH×W×P\mathbf{I}_{\text{red}} \in \mathbb{R}^{H \times W \times P} (P≪CP \ll C), then partitioned into overlapping cubic patches X∈RM×M×P\mathbf{X} \in \mathbb{R}^{M \times M \times P}.
  • Network Details:
    • Stacked 3D convolutions yield F\mathbf{F}.
    • Spectral mixing applies a parallel two-layer MLP to each spatial location across PP bands, repeated LL times.
    • Spatial mixing processes permutations of F\mathbf{F} across spatial locations, also with stacked MLPs.
    • Concatenated feature tensors undergo depthwise channel-wise convolution, channel gating, and global average pooling before a final softmax classifier.

MI-EEG Spectral–Spatial Preprocessing and MixNet

  • FBCSP Feature Extraction: MI-EEG trials Xi∈RNc×tX_i \in \mathbb{R}^{N_c \times t} are band-pass filtered into NbN_b frequency bands. CSP is solved for each band to create class-conditional spatial filters; the spectral-spatial features are stacks of CSP-filtered time series across all bands, presented as (1,t,Uâ‹…Nb)(1, t, U \cdot N_b) tensors.
  • Multi-Task Learning with MIN2Net: The architecture includes a convolutional autoencoder for reconstruction, a deep metric learning head using semi-hard triplet loss, and a supervised classification head. Adaptive gradient blending regulates task-specific loss contributions by tracking generalization-overfitting curves per task.

3. Training Protocols and Datasets

  • Datasets: Evaluated on QUH-Tangdaowan (18 land-cover classes, ∼\sim200 bands, PCA-reduced to 15) and QUH-Qingyun (6 urban classes, PCA-reduced to 15).
  • Patch Extraction: 9×9×159\times9\times15 cubes per pixel.
  • Splits: 1% training, 1% validation, 98% testing, stratified per class.
  • Optimization: Cross-entropy loss, Adam optimizer (10−310^{-3} initial learning rate), early stopping (patience 10). Batch size 64; convergence in typically 70–80 out of 100 epochs.
  • Datasets: Six standard MI-EEG datasets (BCIC-2a, BNCI2015, SMR-BCI, High-Gamma, OpenBMI, BCIC-IV-2b), evaluated in subject-dependent (SD) and subject-independent (SI) settings, with both high-density and 3-channel low-density montages.
  • Multi-Task Loss: Combined loss weighted adaptively

L(n)=∑m=13w(m)(n) L(m)(n)\mathcal{L}(n) = \sum_{m=1}^3 w^{(m)}(n)\,\mathcal{L}^{(m)}(n)

where w(m)(n)w^{(m)}(n) are normalized weights updated per-task and per-epoch via gradients of the validation and training loss trajectories.

4. Quantitative Results

HSI Classification

Dataset SS-MixNet OA Best Baseline OA #Params FLOPs
Tangdaowan 95.68% 94.50% (3D-CNN) 140,914 1.93M
Qingyun 93.86% 93.14% (IP-SWIN)
  • SS-MixNet achieves the highest overall accuracy (OA), average accuracy (AA), and kappa statistics on both benchmarks.
  • Per-class performance is optimal in the majority of classes (11/18 Tangdaowan classes).
  • The architecture exhibits sharper class boundaries and reduced salt-and-pepper noise in classification maps compared to all tested baselines (2D-CNN, 3D-CNN, IP-SWIN, SimPoolFormer, HybridKAN) (Alkhatib, 19 Nov 2025).
  • Computational complexity is minimized (0.77M MACs) while surpassing larger models such as SimPoolFormer (771K parameters, 57.5M FLOPs).

MI-EEG Classification

Dataset SD Acc. (MixNet) SI Acc. (MixNet)
BCIC-2a 77.6% ±15.1 69.4% ±11.8
BNCI2015 80.0% ±13.2 66.2% ±11.7
SMR-BCI 76.3% ±16.5 66.1% ±14.0
High-Gamma 80.2% ±15.3 69.8% ±10.8
OpenBMI 68.9% ±16.8 72.0% ±14.2

For three-channel EEG (BCIC-IV-2b), MixNet achieves 77.1% (SD) and 75.7% (SI) accuracy, outperforming all reference models by 2–10% average (Autthasan et al., 6 Sep 2024).

5. Component Contribution and Ablation Findings

  • HSI Ablation:
    • Adding the spectral mixer to the 3D-CNN baseline raises OA from 94.20% to 95.07%.
    • Adding only the spatial mixer provides 94.89% OA.
    • Combining both mixers achieves 95.38% OA.
    • The full model, including the channel-wise depthwise attention, attains the final OA of 95.68% (Alkhatib, 19 Nov 2025).
    • This indicates strictly additive benefits of each module.
  • MI-EEG Ablation:
    • The FBCSP spectral-spatial preprocessing is essential for encoding discriminative patterns.
    • The MIN2Net multi-task module, with adaptive weighting, provides robust generalization across dense and sparse EEG montages.

6. Implementation and Reproducibility

Reference code for HSI SS-MixNet is scheduled for public release at https://github.com/mqalkhatib/SS-MixNet and is structured for exact reproducibility (fixed seeds, comprehensive data and model scripts) (Alkhatib, 19 Nov 2025).

  • Main modules: data_loader.py (augmentation, PCA, extraction, splitting), models/ss_mixnet.py (architecture), train.py (training, scheduling), evaluate.py (metrics and visualization).
  • Dependencies include TensorFlow 2.10.0, numpy, and scikit-learn.
  • Reproduction command:
    1
    
    python train.py --dataset=Tangdaowan --epochs=100 --batch_size=64

For MI-EEG MixNet, the architecture, loss functions, and adaptive blending algorithm are fully specified, and evaluation benchmarks and ablation settings are comprehensively reported, enabling replication and further extension (Autthasan et al., 6 Sep 2024).

7. Context and Significance

SS-MixNet architectures advance the state of the art by integrating local, high-dimensional feature extraction with global mixer-based modeling, while maintaining computational tractability. In HSI, this yields robust per-pixel classification under extremely low supervision (1% labeled data). In MI-EEG, the combination of classical spectral–spatial methods with multi-task deep learning and adaptive loss blending enhances cross-subject generalization and supports low-density EEG settings. These methodologies demonstrate strict improvements over baselines using either classical, CNN, or transformer-based alternatives. A plausible implication is that spectral-spatial mixer paradigms, as exemplified by SS-MixNet, can serve as a blueprint for efficient, high-accuracy modeling in other domains characterized by multimodal or multi-axis structure.

References:

  • (Alkhatib, 19 Nov 2025) Hyperspectral Image Classification using Spectral-Spatial Mixer Network
  • (Autthasan et al., 6 Sep 2024) MixNet: Joining Force of Classical and Modern Approaches Toward the Comprehensive Pipeline in Motor Imagery EEG Classification
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to SS-MixNet.