RL-BioAug Framework for EEG Augmentation

Updated 23 January 2026

The paper demonstrates that RL-BioAug significantly improves EEG representation quality by employing a transformer-based RL agent to choose augmentation strategies using only 10% label guidance.
RL-BioAug is a framework that integrates a self-supervised encoder with context-aware augmentation, adapting to EEG non-stationarity and outperforming heuristic methods.
Experimental results show up to 9.7% absolute gain in Macro-F1 scores on Sleep-EDFX and CHB-MIT datasets, highlighting its effectiveness in sleep staging and seizure detection.

RL-BioAug is a label-efficient reinforcement learning (RL) framework that establishes an autonomous paradigm for data augmentation in self-supervised electroencephalography (EEG) representation learning. RL-BioAug employs an RL agent, guided by only a minimal fraction (10%) of labeled data, to determine context-appropriate augmentation strategies for each data sample. Its primary aim is to improve representation quality for downstream tasks under the inherent non-stationarity of EEG signals, outperforming heuristic or random composite augmentation techniques (Lee et al., 20 Jan 2026).

1. System Architecture

RL-BioAug integrates three major modules: a self-supervised encoder, a data-augmentation module, and a transformer-based RL agent. The encoder $f_\theta$ is implemented as a 1D ResNet-18 mapping EEG segment $x$ to an embedding $z = f_\theta(x)$ . The data-augmentation module generates a "weak" view ( $x_{\text{weak}}$ ) using fixed small jitter and scaling, and a "strong" view ( $x_{\text{strong}}$ ) by applying augmentations chosen by the RL agent. The RL agent $\pi_\phi$ uses both the current embedding (state $s_t$ ) and a history of past actions and rewards to generate a probability distribution across augmentation actions, sampling one per instance.

The training loop, executed per batch and sample, consists of:

Computing $s_t$ using the frozen encoder $f_\theta$ .
Having $\pi_\phi$ select $a_t$ (augmentation) given $(s_t, a_{t-K:t-1}, r_{t-K:t-1})$ .
Augmenting $x$ into $x_{\text{strong}}$ (using $a_t$ ) and $x_{\text{weak}}$ (using fixed jitter+scaling).
Updating $f_\theta$ via InfoNCE loss between the two views.
Computing RL agent reward $r_t$ (using Soft-KNN metric and 10% labeled reference set), then updating $\phi$ via a policy-gradient step with entropy regularization.

2. Reinforcement-Learning Components

The RL agent operates with:

State space: $s_t = f_\theta(x_t) \in \mathbb{R}^d$ , representing the encoder's output for each EEG segment.
Action space: Discrete, $A = \{1,2,3,4,5\}$ $A = {1, 2, 3, 4, 5}$ , each corresponding to a distinct "strong" augmentation:
1. Time Masking: random interval zeroed out, length $L \sim \text{Uniform}(L_\text{min}, L_\text{max})$ .
2. Time Permutation: segment sequence shuffled.
3. Crop & Resize: contiguous window cropped (size $\alpha T$ ), resized back to length $T$ .
4. Time Flip: reverse time axis.
5. Time Warp: random subinterval sped up or slowed down by $v \sim \text{Uniform}(v_\text{min}, v_\text{max})$ .
Reward: Soft-KNN consistency score calculated on a labeled reference set $R$ , comparing whether the embedding $z_i$ of the current sample consistently shares labels with its $K$ nearest neighbors:

$r_t = P(y_i|z_i) = \frac{\sum_{j\in\mathcal{N}_i} \exp(\text{sim}(z_i,z_j)/\tau) \cdot \mathbb{I}[y_j = y_i]}{\sum_{k\in\mathcal{N}_i} \exp(\text{sim}(z_i,z_k)/\tau)}$

with $\mathcal{N}_i$ denoting $K$ nearest neighbors, $\text{sim}$ as cosine similarity, and $r_t \in [0,1]$ .

3. Policy Network Details

The RL agent’s policy network is a transformer, receiving:

The current sample’s state embedding $s_t$ .
Embedded representations of the last $K$ actions and rewards.
All inputs fused via fully connected layers to a shared dimension.
Learned positional encodings and $L$ self-attention blocks model the temporal policy-reward dynamics.
Final logits over action space $A$ produced by an output head followed by softmax; actions sampled via Top-K strategy.

Policy gradients use the REINFORCE++ update:

$A_t = r_t - b, \quad b = \frac{1}{|B|}\sum_{i \in \text{batch}} r_i$

$L_{\text{policy}} = -\mathbb{E}[ \log \pi_\phi(a_t|s_t) \cdot A_t ] - \beta \gamma H(\pi_\phi)$

where entropy $H(\pi_\phi)$ fosters exploration, and $\phi$ is updated via gradient descent.

4. Contrastive Learning Objective

RL-BioAug employs SimCLR's InfoNCE objective:

$L_{\text{contrastive}} = -\mathbb{E}_{i \in B} \log \left( \frac{\exp(\text{sim}(z_i^{\text{weak}}, z_i^{\text{strong}})/\tau)}{\sum_{j=1}^{2|B|} \exp(\text{sim}(z_i^{\text{weak}}, z_j)/\tau)} \right)$

Each sample produces two augmented views:

Weak augmentation: fixed jitter + scaling.
Strong augmentation: transformation selected by agent $\pi_\phi$ .

5. Training Strategy and Label Utilization

Training proceeds in two phases: Phase 1 (agent pre-training):

10% of data with labels forms both the reference set $R$ and the unlabeled pool.
RL agent $\pi_\phi$ learns augmentation policy using only reward signals, without propagating label supervision into the encoder.

Phase 2 (self-supervised encoder training):

$\pi_\phi$ frozen.
For each sample, agent suggests strongest augmentation; encoder $f_\theta$ trained using contrastive loss over all (unlabeled) data.

Pseudocode for the loop is:

Initialize encoder θ, policy φ
Split 10% labeled → Reference set R; rest unlabeled for sampling
for epoch in 1…N1 (Phase 1):
  for each batch of unlabeled x:
    for each sample x_i in batch:
      s_t = f_θ(frozen)(x_i)
      a_t ∼ TopK_sample( π_φ(·|s_t) )
      x_strong = Aug_strong(x_i; a_t)
      x_weak   = Aug_weak(x_i)
      # self-supervised update
      z_w = f_θ(x_weak);  z_s = f_θ(x_strong)
      L_c = InfoNCE({z_w,z_s})
      θ ← θ – η_enc ∇_θ L_c
      # compute reward on z_w (or z_s)
      r_t = SoftKNN(z_w; R)
      collect (s_t,a_t,r_t)
    # after batch:
    compute A_t = r_t – mean(r) over batch
    L_p = –∑ log π_φ(a_t|s_t)·A_t  – βγ H(π_φ)
    φ ← φ – η_agent ∇_φ L_p
Freeze π_φ
for epoch in 1…N2 (Phase 2):
  for each batch x:
    for x_i in batch:
      a_i = argmax π_φ(·| f_θ(frozen)(x_i) )
      x_strong = Aug_strong(x_i; a_i)
      x_weak   = Aug_weak(x_i)
    compute InfoNCE, update θ only

This approach minimizes the need for expert intervention and labels, leveraging context-aware augmentation strategies to maximize discrimination in self-supervised spaces.

6. Experimental Validation

RL-BioAug was evaluated on Sleep-EDFX (5-class sleep staging) and CHB-MIT (binary seizure detection) datasets. Results reveal:

Dataset	Random Composite (MF1)	RL-BioAug (MF1)	Absolute Gain
Sleep-EDFX	~59.86%	69.55%	+9.69%
CHB-MIT	~62.70%	71.50%	+8.80%

Analysis of learned augmentation strategies:

On Sleep-EDFX, Time Masking was selected with ~62% probability.
On CHB-MIT, Crop & Resize dominated at ~77% probability.

7. Context, Rationales, and Implications

EEG’s pronounced non-stationarity necessitates adaptive data augmentation: different neural states (e.g., REM, seizure) exhibit variable tolerance to distortions; fixed or random policies risk under- or over-distortion. RL-BioAug frames augmentation selection as a reinforcement learning problem, enabling per-sample policy adaptation that maximizes downstream representation quality (as measured by Soft-KNN consistency). Entropy regularization and Top-K sampling balance exploration and exploitation in policy selection.

Label-efficient RL guidance (only 10% of labels needed) allows the self-supervised encoder to function without direct label supervision, suggesting wider applicability to unlabeled biomedical contexts. The framework’s capability to supplant heuristic-based augmentation approaches and its demonstrated improvements in Macro-F1 scores indicate its significance for autonomous EEG data augmentation (Lee et al., 20 Jan 2026). A plausible implication is that similar RL-driven augmentation policies may benefit other non-stationary time-series domains with limited labeled data.

Markdown Upgrade to Chat

References (1)

RL-BioAug: Label-Efficient Reinforcement Learning for Self-Supervised EEG Representation Learning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RL-BioAug Framework.