Papers
Topics
Authors
Recent
Search
2000 character limit reached

RL-BioAug Framework for EEG Augmentation

Updated 23 January 2026
  • The paper demonstrates that RL-BioAug significantly improves EEG representation quality by employing a transformer-based RL agent to choose augmentation strategies using only 10% label guidance.
  • RL-BioAug is a framework that integrates a self-supervised encoder with context-aware augmentation, adapting to EEG non-stationarity and outperforming heuristic methods.
  • Experimental results show up to 9.7% absolute gain in Macro-F1 scores on Sleep-EDFX and CHB-MIT datasets, highlighting its effectiveness in sleep staging and seizure detection.

RL-BioAug is a label-efficient reinforcement learning (RL) framework that establishes an autonomous paradigm for data augmentation in self-supervised electroencephalography (EEG) representation learning. RL-BioAug employs an RL agent, guided by only a minimal fraction (10%) of labeled data, to determine context-appropriate augmentation strategies for each data sample. Its primary aim is to improve representation quality for downstream tasks under the inherent non-stationarity of EEG signals, outperforming heuristic or random composite augmentation techniques (Lee et al., 20 Jan 2026).

1. System Architecture

RL-BioAug integrates three major modules: a self-supervised encoder, a data-augmentation module, and a transformer-based RL agent. The encoder fθf_\theta is implemented as a 1D ResNet-18 mapping EEG segment xx to an embedding z=fθ(x)z = f_\theta(x). The data-augmentation module generates a "weak" view (xweakx_{\text{weak}}) using fixed small jitter and scaling, and a "strong" view (xstrongx_{\text{strong}}) by applying augmentations chosen by the RL agent. The RL agent πϕ\pi_\phi uses both the current embedding (state sts_t) and a history of past actions and rewards to generate a probability distribution across augmentation actions, sampling one per instance.

The training loop, executed per batch and sample, consists of:

  1. Computing sts_t using the frozen encoder fθf_\theta.
  2. Having πϕ\pi_\phi select ata_t (augmentation) given (st,atK:t1,rtK:t1)(s_t, a_{t-K:t-1}, r_{t-K:t-1}).
  3. Augmenting xx into xstrongx_{\text{strong}} (using ata_t) and xweakx_{\text{weak}} (using fixed jitter+scaling).
  4. Updating fθf_\theta via InfoNCE loss between the two views.
  5. Computing RL agent reward rtr_t (using Soft-KNN metric and 10% labeled reference set), then updating ϕ\phi via a policy-gradient step with entropy regularization.

2. Reinforcement-Learning Components

The RL agent operates with:

  • State space: st=fθ(xt)Rds_t = f_\theta(x_t) \in \mathbb{R}^d, representing the encoder's output for each EEG segment.
  • Action space: Discrete, A={1,2,3,4,5}A = \{1,2,3,4,5\}, each corresponding to a distinct "strong" augmentation:

    1. Time Masking: random interval zeroed out, length LUniform(Lmin,Lmax)L \sim \text{Uniform}(L_\text{min}, L_\text{max}).
    2. Time Permutation: segment sequence shuffled.
    3. Crop & Resize: contiguous window cropped (size αT\alpha T), resized back to length TT.
    4. Time Flip: reverse time axis.
    5. Time Warp: random subinterval sped up or slowed down by vUniform(vmin,vmax)v \sim \text{Uniform}(v_\text{min}, v_\text{max}).
  • Reward: Soft-KNN consistency score calculated on a labeled reference set RR, comparing whether the embedding ziz_i of the current sample consistently shares labels with its KK nearest neighbors:

rt=P(yizi)=jNiexp(sim(zi,zj)/τ)I[yj=yi]kNiexp(sim(zi,zk)/τ)r_t = P(y_i|z_i) = \frac{\sum_{j\in\mathcal{N}_i} \exp(\text{sim}(z_i,z_j)/\tau) \cdot \mathbb{I}[y_j = y_i]}{\sum_{k\in\mathcal{N}_i} \exp(\text{sim}(z_i,z_k)/\tau)}

with Ni\mathcal{N}_i denoting KK nearest neighbors, sim\text{sim} as cosine similarity, and rt[0,1]r_t \in [0,1].

3. Policy Network Details

The RL agent’s policy network is a transformer, receiving:

  • The current sample’s state embedding sts_t.
  • Embedded representations of the last KK actions and rewards.
  • All inputs fused via fully connected layers to a shared dimension.
  • Learned positional encodings and LL self-attention blocks model the temporal policy-reward dynamics.
  • Final logits over action space AA produced by an output head followed by softmax; actions sampled via Top-K strategy.

Policy gradients use the REINFORCE++ update:

At=rtb,b=1BibatchriA_t = r_t - b, \quad b = \frac{1}{|B|}\sum_{i \in \text{batch}} r_i

Lpolicy=E[logπϕ(atst)At]βγH(πϕ)L_{\text{policy}} = -\mathbb{E}[ \log \pi_\phi(a_t|s_t) \cdot A_t ] - \beta \gamma H(\pi_\phi)

where entropy H(πϕ)H(\pi_\phi) fosters exploration, and ϕ\phi is updated via gradient descent.

4. Contrastive Learning Objective

RL-BioAug employs SimCLR's InfoNCE objective:

Lcontrastive=EiBlog(exp(sim(ziweak,zistrong)/τ)j=12Bexp(sim(ziweak,zj)/τ))L_{\text{contrastive}} = -\mathbb{E}_{i \in B} \log \left( \frac{\exp(\text{sim}(z_i^{\text{weak}}, z_i^{\text{strong}})/\tau)}{\sum_{j=1}^{2|B|} \exp(\text{sim}(z_i^{\text{weak}}, z_j)/\tau)} \right)

Each sample produces two augmented views:

  • Weak augmentation: fixed jitter + scaling.
  • Strong augmentation: transformation selected by agent πϕ\pi_\phi.

5. Training Strategy and Label Utilization

Training proceeds in two phases: Phase 1 (agent pre-training):

  • 10% of data with labels forms both the reference set RR and the unlabeled pool.
  • RL agent πϕ\pi_\phi learns augmentation policy using only reward signals, without propagating label supervision into the encoder.

Phase 2 (self-supervised encoder training):

  • πϕ\pi_\phi frozen.
  • For each sample, agent suggests strongest augmentation; encoder fθf_\theta trained using contrastive loss over all (unlabeled) data.

Pseudocode for the loop is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Initialize encoder θ, policy φ
Split 10% labeled  Reference set R; rest unlabeled for sampling
for epoch in 1N1 (Phase 1):
  for each batch of unlabeled x:
    for each sample x_i in batch:
      s_t = f_θ(frozen)(x_i)
      a_t  TopK_sample( π_φ(·|s_t) )
      x_strong = Aug_strong(x_i; a_t)
      x_weak   = Aug_weak(x_i)
      # self-supervised update
      z_w = f_θ(x_weak);  z_s = f_θ(x_strong)
      L_c = InfoNCE({z_w,z_s})
      θ  θ  η_enc _θ L_c
      # compute reward on z_w (or z_s)
      r_t = SoftKNN(z_w; R)
      collect (s_t,a_t,r_t)
    # after batch:
    compute A_t = r_t  mean(r) over batch
    L_p =  log π_φ(a_t|s_t)·A_t   βγ H(π_φ)
    φ  φ  η_agent _φ L_p
Freeze π_φ
for epoch in 1N2 (Phase 2):
  for each batch x:
    for x_i in batch:
      a_i = argmax π_φ(·| f_θ(frozen)(x_i) )
      x_strong = Aug_strong(x_i; a_i)
      x_weak   = Aug_weak(x_i)
    compute InfoNCE, update θ only
This approach minimizes the need for expert intervention and labels, leveraging context-aware augmentation strategies to maximize discrimination in self-supervised spaces.

6. Experimental Validation

RL-BioAug was evaluated on Sleep-EDFX (5-class sleep staging) and CHB-MIT (binary seizure detection) datasets. Results reveal:

Dataset Random Composite (MF1) RL-BioAug (MF1) Absolute Gain
Sleep-EDFX ~59.86% 69.55% +9.69%
CHB-MIT ~62.70% 71.50% +8.80%

Analysis of learned augmentation strategies:

  • On Sleep-EDFX, Time Masking was selected with ~62% probability.
  • On CHB-MIT, Crop & Resize dominated at ~77% probability.

7. Context, Rationales, and Implications

EEG’s pronounced non-stationarity necessitates adaptive data augmentation: different neural states (e.g., REM, seizure) exhibit variable tolerance to distortions; fixed or random policies risk under- or over-distortion. RL-BioAug frames augmentation selection as a reinforcement learning problem, enabling per-sample policy adaptation that maximizes downstream representation quality (as measured by Soft-KNN consistency). Entropy regularization and Top-K sampling balance exploration and exploitation in policy selection.

Label-efficient RL guidance (only 10% of labels needed) allows the self-supervised encoder to function without direct label supervision, suggesting wider applicability to unlabeled biomedical contexts. The framework’s capability to supplant heuristic-based augmentation approaches and its demonstrated improvements in Macro-F1 scores indicate its significance for autonomous EEG data augmentation (Lee et al., 20 Jan 2026). A plausible implication is that similar RL-driven augmentation policies may benefit other non-stationary time-series domains with limited labeled data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RL-BioAug Framework.