LaBraM-Based EEG Encoder

Updated 8 December 2025

LaBraM-Based Encoder is a pre-trained neural architecture designed to tokenize and compress multichannel EEG data using patch-and-quantize methods.
It segments continuous EEG signals into patches that are embedded using temporal convolutions and vector-quantized via a learnable codebook, ensuring scale invariance.
The encoder underpins EEG foundation models, achieving performance gains in reconstruction accuracy and cross-dataset generalization for various BCI tasks.

A LaBraM-based encoder is a pre-trained neural architecture designed for large-scale electroencephalogram (EEG) representation learning, distinguished by its "patch-and-quantize" paradigm and vector-quantized neural tokenization of spectral EEG features. It forms the foundation of the Large Brain Model (LaBraM) family, which aims to address the heterogeneity, noise, and limited scale of EEG datasets through unified and scalable self-supervised learning, facilitating generalizable neural representations for brain-computer interfaces (BCIs) and related applications (Jiang et al., 2024, Barmpas et al., 22 May 2025).

1. Architectural Overview

LaBraM-based encoders segment continuous multichannel EEG time-series into patches and encode them into compact discrete representations using a two-stage pipeline: (1) temporal convolution-based embedding of EEG snippets and (2) vector-quantization via a learnable neural codebook. Following this, a Transformer encoder processes the tokenized sequences over spatio-temporal EEG patch positions.

Both the original LaBraM and the enhanced LaBraM++ retain this paradigm, differing primarily in the pre-processing/tokenization strategy, codebook design, and phase-aware loss functions. The encoder backbone typically comprises 12–48 Transformer blocks (depending on model size) with multi-head self-attention and residual connections; hyperparameters such as model dimensionality, head count, and MLP width are set according to model configuration (Base/Large/Huge) (Jiang et al., 2024, Barmpas et al., 22 May 2025).

2. EEG Input Segmentation and Pre-processing

EEG signals $X \in \mathbb{R}^{C \times T}$ (with $C$ channels/electrodes and $T$ time steps) undergo bandpass filtering (0.5–44.5 Hz) and resampling to 200 Hz to mitigate mains noise and standardize sampling frequency. Next, the signal is divided into non-overlapping patches of fixed duration—typically $w=200$ samples (1 s). For windowed sampling, stride $s=800$ (4 s windows with no overlap) is used. In LaBraM++, 256 1-second patches are randomly sampled per recording, with random subsets of electrodes, yielding an input tensor $x \in \mathbb{R}^{P \times W}$ (where $P=256$ , $W=200$ ) (Barmpas et al., 22 May 2025, Jiang et al., 2024).

Pre-processing may additionally include common-average referencing (CAR) across channels and Z-scoring in time to suppress shared noise and amplitude artifacts prior to tokenization.

3. Patch Embedding, Codebook, and Quantization

Each EEG patch is embedded via a stack of temporal convolutional (1D or 2D) layers followed by GroupNorm and GELU activation, projecting each patch to a $D$ -dimensional latent vector ( $D=200$ for most Base models). Spatial (electrode-specific) and temporal (sequence index) embeddings are added to capture anatomical and temporal relationships:

$\tilde{p} = p + \mathrm{SE}[\text{chan\_index}] + \mathrm{TE}[\text{patch\_index}]$

with learnable dictionaries of size $|\mathrm{SE}|=104$ and $|\mathrm{TE}|=256$ (Barmpas et al., 22 May 2025).

Quantization is performed using a learnable codebook $\mathcal{V} = \{v_j\} \subset \mathbb{R}^{K \times D_\text{code}}$ ( $K=8192$ , $D_\text{code}=64$ ). Each embedded patch is assigned a discrete token based on the nearest codebook entry in $\ell_2$ -normalized space:

$z = \arg\min_{v \in \mathcal{V}} \|\ell_2(\tilde{p}) - \ell_2(v)\|_2$

ensuring the tokenization is invariant to vector scale and sensitive only to angular relationships, in line with common practices for neural vector-quantized representations (Jiang et al., 2024, Barmpas et al., 22 May 2025).

The codebook and embedding layers are jointly optimized using a composite reconstruction objective. Each EEG patch is Fourier-transformed to extract amplitude $A_i$ and phase $\phi_i$ . The decoder head predicts $\hat{A}_i$ and $\hat{\phi}_i$ from quantized embeddings. Loss functions include:

Mean squared amplitude loss: $\mathcal{L}_A = \|\hat{A}_i - A_i\|_2^2$
Phase loss terms:

$\mathcal{L}_{\sin} + \mathcal{L}_{\cos} = \|\sin(\hat{\phi}_i) - \sin(\phi_i)\|_2^2 + \|\cos(\hat{\phi}_i) - \cos(\phi_i)\|_2^2$

which simplifies to $4\sin^2\left(\frac{\hat{\phi}_i - \phi_i}{2}\right)$ , thus respecting the circular nature of phase and providing smooth gradients across the $\pm\pi$ discontinuity (Barmpas et al., 22 May 2025).

Vector-quantization loss ( $\mathcal{L}_Q$ ), as used in VQ-VAE.

LaBraM++ replaces the direct-phase regression loss of LaBraM with this sine/cosine variant, yielding smoother gradients and improving optimization near phase-wrap points.

The full tokenizer loss is:

$\mathcal{L}_T = \sum_i \mathcal{L}_A + \mathcal{L}_{\sin} + \mathcal{L}_{\cos} + \mathcal{L}_Q$

Downstream Transformer encoder pretraining employs masked or reconstruction-style language modeling over the token sequence, using the same encoder architecture as in the neural-tokenizer stage (Jiang et al., 2024, Barmpas et al., 22 May 2025).

5. Implementation Details and Training Pipeline

Key implementation steps are summarized below:

Step	Operation	Key Hyperparameters
1. Segmentation/Preprocessing	Bandpass, resample, patch extraction, optional CAR/Z-score	$w=200$ , $P=256$ patches, $C\leq104$
2. Embedding	Temporal CONVs $\to$ add SE/TE	3 CONVs, $D=200$
3. Quantization	$\ell_2$ normalization, hard nearest-neighbor lookup, codebook update (EMA/learned)	$K=8192$ , $D_\text{code}=64$
4. Tokenizer Pretraining	Spectrum regression w/ VQ loss, phase via sin/cos losses	see $\mathcal{L}_T$
5. Transformer Pretraining	Masked modeling over discrete token sequence	12 blocks, 10 heads, $d_\text{model}=200$

Neural-tokenizer and encoder are trained for 100 and 50 epochs respectively, using AdamW, cosine learning-rate schedules, and gradient clipping. Fine-tuning on downstream EEG tasks follows similar optimizer settings with layerwise learning rate decay and dropout (Jiang et al., 2024, Barmpas et al., 22 May 2025).

6. Advantages, Performance Gains, and Rationale

Discrete tokens constructed via the neural codebook impose a learned, compressive coding over recurrent neural motifs, analogous to Huffman encoding. This improves representation compactness, generalization, and cross-dataset transfer. Sine/cosine phase-aware loss preserves topological continuity, overcoming the limitations of direct-phase regression in gradient-based optimization, especially near phase-wrap discontinuities.

LaBraM++ demonstrates:

A ~6% absolute mean-accuracy gain over LaBraM across five canonical BCI tasks (Motor, Memory, Sleep-EDF, Eyes Open/Closed).
~10% relative reduction in pretraining loss.
Superior signal-reconstruction, particularly for phase-sensitive, low-frequency oscillatory components.

Flexible randomization of electrode/channel selection and rich spatial/temporal embeddings further empower seamless pretraining on heterogeneous and large-scale EEG montages.

7. Applications and Extensions

LaBraM-based encoders serve as the backbone for foundation models in EEG analysis, enabling universal perceptual features that generalize across abnormal detection, event classification, emotion recognition, and gait prediction. This architecture establishes a paradigm for large EEG models, analogous to LLMs in NLP, supporting unsupervised pretraining and downstream fine-tuning for diverse BCI and neuroinformatics tasks (Jiang et al., 2024, Barmpas et al., 22 May 2025).

Their design supports cross-dataset pretraining, robustness to varying montage configurations, and efficient training at scale, laying the groundwork for extensible transformer-based EEG analytics and foundation models.