RL-BioAug Framework for EEG Augmentation
- The paper demonstrates that RL-BioAug significantly improves EEG representation quality by employing a transformer-based RL agent to choose augmentation strategies using only 10% label guidance.
- RL-BioAug is a framework that integrates a self-supervised encoder with context-aware augmentation, adapting to EEG non-stationarity and outperforming heuristic methods.
- Experimental results show up to 9.7% absolute gain in Macro-F1 scores on Sleep-EDFX and CHB-MIT datasets, highlighting its effectiveness in sleep staging and seizure detection.
RL-BioAug is a label-efficient reinforcement learning (RL) framework that establishes an autonomous paradigm for data augmentation in self-supervised electroencephalography (EEG) representation learning. RL-BioAug employs an RL agent, guided by only a minimal fraction (10%) of labeled data, to determine context-appropriate augmentation strategies for each data sample. Its primary aim is to improve representation quality for downstream tasks under the inherent non-stationarity of EEG signals, outperforming heuristic or random composite augmentation techniques (Lee et al., 20 Jan 2026).
1. System Architecture
RL-BioAug integrates three major modules: a self-supervised encoder, a data-augmentation module, and a transformer-based RL agent. The encoder is implemented as a 1D ResNet-18 mapping EEG segment to an embedding . The data-augmentation module generates a "weak" view () using fixed small jitter and scaling, and a "strong" view () by applying augmentations chosen by the RL agent. The RL agent uses both the current embedding (state ) and a history of past actions and rewards to generate a probability distribution across augmentation actions, sampling one per instance.
The training loop, executed per batch and sample, consists of:
- Computing using the frozen encoder .
- Having select (augmentation) given .
- Augmenting into (using ) and (using fixed jitter+scaling).
- Updating via InfoNCE loss between the two views.
- Computing RL agent reward (using Soft-KNN metric and 10% labeled reference set), then updating via a policy-gradient step with entropy regularization.
2. Reinforcement-Learning Components
The RL agent operates with:
- State space: , representing the encoder's output for each EEG segment.
- Action space: Discrete, , each corresponding to a distinct "strong" augmentation:
- Time Masking: random interval zeroed out, length .
- Time Permutation: segment sequence shuffled.
- Crop & Resize: contiguous window cropped (size ), resized back to length .
- Time Flip: reverse time axis.
- Time Warp: random subinterval sped up or slowed down by .
Reward: Soft-KNN consistency score calculated on a labeled reference set , comparing whether the embedding of the current sample consistently shares labels with its nearest neighbors:
with denoting nearest neighbors, as cosine similarity, and .
3. Policy Network Details
The RL agent’s policy network is a transformer, receiving:
- The current sample’s state embedding .
- Embedded representations of the last actions and rewards.
- All inputs fused via fully connected layers to a shared dimension.
- Learned positional encodings and self-attention blocks model the temporal policy-reward dynamics.
- Final logits over action space produced by an output head followed by softmax; actions sampled via Top-K strategy.
Policy gradients use the REINFORCE++ update:
where entropy fosters exploration, and is updated via gradient descent.
4. Contrastive Learning Objective
RL-BioAug employs SimCLR's InfoNCE objective:
Each sample produces two augmented views:
- Weak augmentation: fixed jitter + scaling.
- Strong augmentation: transformation selected by agent .
5. Training Strategy and Label Utilization
Training proceeds in two phases: Phase 1 (agent pre-training):
- 10% of data with labels forms both the reference set and the unlabeled pool.
- RL agent learns augmentation policy using only reward signals, without propagating label supervision into the encoder.
Phase 2 (self-supervised encoder training):
- frozen.
- For each sample, agent suggests strongest augmentation; encoder trained using contrastive loss over all (unlabeled) data.
Pseudocode for the loop is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
Initialize encoder θ, policy φ Split 10% labeled → Reference set R; rest unlabeled for sampling for epoch in 1…N1 (Phase 1): for each batch of unlabeled x: for each sample x_i in batch: s_t = f_θ(frozen)(x_i) a_t ∼ TopK_sample( π_φ(·|s_t) ) x_strong = Aug_strong(x_i; a_t) x_weak = Aug_weak(x_i) # self-supervised update z_w = f_θ(x_weak); z_s = f_θ(x_strong) L_c = InfoNCE({z_w,z_s}) θ ← θ – η_enc ∇_θ L_c # compute reward on z_w (or z_s) r_t = SoftKNN(z_w; R) collect (s_t,a_t,r_t) # after batch: compute A_t = r_t – mean(r) over batch L_p = –∑ log π_φ(a_t|s_t)·A_t – βγ H(π_φ) φ ← φ – η_agent ∇_φ L_p Freeze π_φ for epoch in 1…N2 (Phase 2): for each batch x: for x_i in batch: a_i = argmax π_φ(·| f_θ(frozen)(x_i) ) x_strong = Aug_strong(x_i; a_i) x_weak = Aug_weak(x_i) compute InfoNCE, update θ only |
6. Experimental Validation
RL-BioAug was evaluated on Sleep-EDFX (5-class sleep staging) and CHB-MIT (binary seizure detection) datasets. Results reveal:
| Dataset | Random Composite (MF1) | RL-BioAug (MF1) | Absolute Gain |
|---|---|---|---|
| Sleep-EDFX | ~59.86% | 69.55% | +9.69% |
| CHB-MIT | ~62.70% | 71.50% | +8.80% |
Analysis of learned augmentation strategies:
- On Sleep-EDFX, Time Masking was selected with ~62% probability.
- On CHB-MIT, Crop & Resize dominated at ~77% probability.
7. Context, Rationales, and Implications
EEG’s pronounced non-stationarity necessitates adaptive data augmentation: different neural states (e.g., REM, seizure) exhibit variable tolerance to distortions; fixed or random policies risk under- or over-distortion. RL-BioAug frames augmentation selection as a reinforcement learning problem, enabling per-sample policy adaptation that maximizes downstream representation quality (as measured by Soft-KNN consistency). Entropy regularization and Top-K sampling balance exploration and exploitation in policy selection.
Label-efficient RL guidance (only 10% of labels needed) allows the self-supervised encoder to function without direct label supervision, suggesting wider applicability to unlabeled biomedical contexts. The framework’s capability to supplant heuristic-based augmentation approaches and its demonstrated improvements in Macro-F1 scores indicate its significance for autonomous EEG data augmentation (Lee et al., 20 Jan 2026). A plausible implication is that similar RL-driven augmentation policies may benefit other non-stationary time-series domains with limited labeled data.