EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network

Published 21 Jul 2025 in eess.SP, cs.AI, and cs.LG | (2507.15364v1)

Abstract: Epilepsy is a chronic, noncommunicable brain disorder, and sudden seizure onsets can significantly impact patients' quality of life and health. However, wearable seizure-predicting devices are still limited, partly due to the bulky size of EEG-collecting devices. To relieve the problem, we proposed a novel two-stage channel-aware Set Transformer Network that could perform seizure prediction with fewer EEG channel sensors. We also tested a seizure-independent division method which could prevent the adjacency of training and test data. Experiments were performed on the CHB-MIT dataset which includes 22 patients with 88 merged seizures. The mean sensitivity before channel selection was 76.4% with a false predicting rate (FPR) of 0.09/hour. After channel selection, dominant channels emerged in 20 out of 22 patients; the average number of channels was reduced to 2.8 from 18; and the mean sensitivity rose to 80.1% with an FPR of 0.11/hour. Furthermore, experimental results on the seizure-independent division supported our assertion that a more rigorous seizure-independent division should be used for patients with abundant EEG recordings.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel two-stage channel-aware Set Transformer that reduces the electrode count while enhancing seizure prediction accuracy.
It employs seizure-independent and even division strategies to ensure realistic clinical validation and counteract overfitting in interictal data.
The method achieves high sensitivity (up to 80.1%) and low false prediction rates, enabling practical real-time EEG deployment.

EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network

Introduction and Motivation

Epilepsy remains a prevalent neurological disorder, impacting over fifty million individuals globally, with a significant proportion suffering from intractable seizures. Timely and accurate seizure prediction is crucial for effective intervention and patient safety. However, wearable EEG-based devices are encumbered by the necessity for large sensor arrays, limiting their practicality for continuous real-world use. This work proposes a methodology to drastically reduce the sensor footprint while maintaining and, in many cases, improving predictive accuracy and false prediction rates.

Methodology

Data Preparation and Feature Engineering

The study utilizes the CHB-MIT EEG dataset, composed of recordings from 22 pediatric patients, each with 18 consistent EEG channels. Key preprocessing steps include exclusion of only the one-hour period surrounding seizures to define interictal data—yielding a more challenging prediction task compared to more permissive exclusion strategies. A 2-second, 50% overlapping window with FFT-based spectral analysis generates a feature set of 44 dimensions per channel, comprising absolute, relative, and inter-band power ratios across eight frequency sub-bands.

Division Strategies

A key methodological differentiation is the adoption of a seizure-independent division in addition to the conventional “even division.” The seizure-independent division prevents ad hoc splitting of continuous interictal sequences, strictly segregating training and test events—aligning more closely with genuine clinical usage scenarios.

Figure 1: Illustration of ictal, preictal, interictal periods, SPH, and data exclusion boundaries.

Figure 2: Schematic contrasting the even division and seizure-independent division; only the latter guarantees no interleaving between held-out and training data.

Two-Stage Channel-aware Set Transformer

The core innovation is the architectural design: a two-stage channel-aware Set Transformer. Instead of standard Transformer models, the Set Transformer is leveraged for permutation invariance and computational efficiency, obviating position encoding.

Temporal Set Transformer: For each EEG channel, sequential segment features over a 38-second window are aggregated temporally using a Set Transformer, modulated via a learnable kernel vector.
Channel-Aware Set Transformer: Temporally merged features from all electrode channels are combined using a second Set Transformer augmented with an attention accumulation and softmax-based selection mechanism, allowing patient-specific determination of dominant channels.
Figure 3: Diagram of the Multi-head Attention Block (MAB) core to the Set Transformer operation.

Figure 4: Overview of the predictive network: sequential per-channel temporal aggregation followed by inter-channel attention and selection.

Figure 5: Dataflow in the channel-aware Set Transformer. The architecture supports dynamic, patient-specific channel selection and retraining on reduced channel sets.

After attention-based channel selection and retraining, average channel usage is dropped from 18 to 2.8, offering a significant reduction in the hardware and practical burden of EEG monitoring.

Real-time Deployment Considerations

With only 37.4K parameters and 8.23M FLOPs per inference, the model processes each arriving EEG data segment (1-second granularity) in 33.5 ms (preprocessing + inference), supporting real-time seizure risk evaluation. While on-device computation remains infeasible, the approach is compatible with remote/cloud-based inference given constant EEG data transmission.

Results

Predictive Performance Across Divisions

Two evaluation schemes are analyzed: even division and the more stringent seizure-independent division.

Even Division: Before channel selection, mean sensitivity and FPR were 76.4% and 0.09/h, respectively. After channel selection, the mean sensitivity increased to 80.1% (FPR 0.11/h, 2.8 channels). Several patients exhibited perfect sensitivity (100%); only two out of 22 lacked attention convergence for selection.
Seizure-Independent Division: Mean sensitivity was 72.6% (FPR 0.08/h), remaining unchanged post-selection, with FPR modestly rising to 0.10/h.

Strong Numerical Results and Contradictory Claims

The method robustly matches or surpasses state-of-the-art models that deploy many more channels or less stringent data divisions.
After channel selection, sensitivity improves despite drastic channel number reduction—a frequently observed tradeoff in prior work is thus partially circumvented.
The experimentally observed performance gap between even and seizure-independent division highlights the risk of overestimating generalization using conventional random splits, a result corroborated within detailed ablation analyses.
Figure 6: Visualization of model outputs during interictal and preictal states; most false positives during interictal periods are transient and easily post-processed out.

Comparative Analysis

When compared to previous literature, including strong baselines such as Affes et al. [affes2022personalized], Shu et al. [shu2024data], and Truong et al. [truong2018convolutional]:

The two-stage Set Transformer achieves competitive or superior sensitivity and FPR using significantly fewer electrode channels.
Competing attention-based selection networks either fail to match these error rates or cannot generalize channel selection as robustly at this sensor sparsity.

Ablation and Computational Analysis

Ablation studies confirm the principal gains come from the channel-aware selection step, with the first-stage Set Transformer offering robust but not singularly dominant advantages over strong baselines (e.g., LSTM). Total model size and inference latency confirm suitability for real-time scenarios given sufficient hardware.

Implications and Future Directions

The strong performance using only a subset of channels implies direct impact for the design of future miniaturized, patient-specific EEG wearables—significantly reducing size, device cost, and patient discomfort without sacrificing predictive accuracy. The seizure-independent division protocol should become standard in future benchmarking to avoid overfitting to contiguous data.

Theoretically, these results reinforce the utility of permutation-invariant transformer architectures for multivariate time-series analysis, particularly in medical and clinical settings where sensor redundancy is high and feature order is not meaningful.

Potential extensions include: adaptation to multimodal signals, integration with transfer learning for low-resource patient adaptation, deployment in federated/edge setups, and exploration of alternative feature descriptors for optimization in poorly-characterized patients.

Conclusion

This work demonstrates that a two-stage, channel-aware Set Transformer with dynamic, patient-specific channel selection can reliably predict epileptic seizures with high sensitivity and low false alarm rates, using an order of magnitude fewer EEG channels than conventional approaches. The methodology challenges standard benchmarks by demonstrating the necessity of seizure-independent cross-validation to reflect real-world constraints. The resultant framework contributes both a practical clinical tool and an important methodological update for future epileptic seizure prediction research.

Markdown