CEReBrO: Compact EEG Representation Models
- CEReBrO is a compact family of EEG foundation models that employ per-channel patch tokenization and alternating attention to capture both temporal and spatial signal dependencies.
- Its design reduces computational complexity while delivering state-of-the-art performance on benchmarks like emotion recognition and neonatal seizure detection.
- The model is available in multiple sizes, enabling efficient on-device deployment and scalable research across clinical and cognitive applications.
CEReBrO (Compact Encoder for Representations of Brain Oscillations) is a family of compact EEG foundation models designed to address the unique challenges of electroencephalography (EEG) signal modeling in self-supervised learning contexts. In contrast to large-scale models exceeding hundreds of millions of parameters commonly used in language and vision, CEReBrO achieves efficient and effective representation learning with model sizes in the range of 3.6 million to 85 million parameters. Its architecture introduces per-channel patch tokenization and a novel alternating attention mechanism, enabling state-of-the-art performance on diverse clinical and cognitive EEG benchmarks while maintaining computational efficiency suitable for edge and on-device deployment (Dimofte et al., 18 Jan 2025).
1. Motivation and Design Goals
EEG signals present complex modeling requirements due to their non-linear, non-stationary nature, exhibiting strong intra-channel temporal dependencies and rich inter-channel spatial correlations. Existing self-supervised EEG foundation models face limitations including:
- Sub-optimal spatio-temporal modeling, such as ignoring inter-channel interactions or relying solely on autoregressive temporal modeling.
- Large model sizes (hundreds of millions of parameters), which impede reproducibility, on-device deployment, and attribution of performance gains to architectural innovations versus sheer scale.
- Inconsistent benchmarking practices, often relying on private datasets or varying public splits.
CEReBrO was developed to counteract these issues by offering a compact architecture that: (i) accurately captures both intra-channel and inter-channel dependencies, (ii) is computationally efficient for real-time or embedded use, and (iii) leverages only publicly available data and standard benchmarks (Dimofte et al., 18 Jan 2025).
2. Per-Channel Patch Tokenization Scheme
The input to CEReBrO consists of raw EEG segments represented as , with time points and channels. Each channel is segmented independently into non-overlapping patches of length (with stride , typically ):
Each patch from channel undergoes a linear projection to yield a 0-dimensional embedding:
1
To retain positional and channel context, patch-position and channel identity embeddings, 2 and 3, are added, producing the final token:
4
This design preserves local temporal and global spatial information by structuring the input as a sequence of 5 tokens (Dimofte et al., 18 Jan 2025).
3. Alternating Attention Mechanism and Complexity
To circumvent the prohibitive quadratic complexity of standard Transformer self-attention over all 6 tokens, CEReBrO introduces an alternating attention design whereby each layer applies one of two modes:
- Intra-channel (temporal) attention: For each channel, attention operates across its 7 patches.
- Inter-channel (spatial) attention: At each patch index, attention is computed across all 8 channels.
Mathematically, for embeddings 9, the two attention modes are formalized as:
- Intra-channel (even layers):
0
computed per channel.
- Inter-channel (odd layers):
1
computed across channels at each patch position.
The combined memory and computation cost is thus:
2
for practical values of 3 and 4. Empirical evaluations show up to 5 speed improvement and 6 reduction in GPU memory compared to standard self-attention (Dimofte et al., 18 Jan 2025).
4. Model Variants and Pre-Training
CEReBrO is available in three sizes, all using 12-head attention:
| Model | Layers | Embedding Dim (7) | MLP Width | Parameters | Deployment Target |
|---|---|---|---|---|---|
| CEReBrO-Small | 8 | 192 | 768 | 3.6M | Ultra-lightweight/wearable |
| CEReBrO-Base | 10 | 576 | 2304 | 40M | Smartphone/Edge-TPU |
| CEReBrO-Large | 12 | 768 | 3072 | 85M | High-performance/task-leader |
Pre-training uses over 20,000 hours of publicly available EEG from the Temple University EEG Corpus (TUEG), via a SimMIM-style masked autoencoding objective on raw waveforms. Fifty percent of patch tokens are masked, and reconstruction loss is computed over both masked and visible patches:
8
total loss:
9
(Dimofte et al., 18 Jan 2025).
5. Empirical Evaluation and Ablations
CEReBrO models, using mean-token pooling and linear classifiers/regressors, were evaluated on four public EEG benchmarks using both full finetuning and linear probing:
- Emotion Recognition (SEED): CEReBrO-Large achieves 68.2% accuracy and F1=0.6845 (SOTA), exceeding prior models such as LaBraM (57.9% accuracy, F1=0.5899).
- Neonatal Seizure Detection (Neonate): CEReBrO-Large reaches AUROC=0.875 and AUPR=0.690, outperforming EEGFormer and other SEFM baselines.
- Anomaly Classification (TUAB): CEReBrO-Large attains 81.7% accuracy, AUPR=0.905, AUROC=0.892, competitive with much larger models like LaBraM (369M params).
- Gait Prediction (MoBI): CEReBrO-Large achieves 0, RMSE=0.1209, nearly matching the performance of LaBraM-Huge (369M params) at a quarter of the parameter budget.
Ablation studies highlight that representing EEG as raw waveforms yields ≈1% higher balanced accuracy than spectrograms on TUAB. Alternating attention confers a +1.3% balanced accuracy improvement over standard self-attention for waveform inputs, and confirms speed and memory savings (Dimofte et al., 18 Jan 2025).
6. Limitations and Outlook
Despite strong performance, CEReBrO exhibits remaining performance gaps on some clinically specialized tasks, particularly in AUROC. The model’s scale, although more moderate than vision and language foundation models, restricts further gains from parameter scaling alone. Future research directions include:
- Integrating additional efficient attention mechanisms (e.g., linear, sparse, or low-rank variants) to further boost speed and scalability.
- Expanding the framework to multimodal neurophysiological data (e.g., EEG coupled with PPG or accelerometry).
- Establishing automated scaling laws for EEG foundation models.
- Enabling on-device fine-tuning and continuous learning for personalized brain–computer interfaces.
CEReBrO establishes that compact, carefully architected EEG foundation models can approach or surpass the performance of much larger models, opening pathways for transparent, efficient, and reproducible advances in neural signal processing (Dimofte et al., 18 Jan 2025).