Dynamic On-the-Fly Augmentation

Updated 25 March 2026

On-the-fly augmentation is a dynamic process that generates or modifies data, model inputs, or computations during execution instead of relying on precomputed resources.
It enhances model robustness by adapting to sample difficulty, data scarcity, and evolving sampling needs through stochastic perturbations and real-time retrieval.
This approach spans domains like computer vision, NLP, and time-series forecasting, yielding measurable improvements in performance and resource efficiency.

On-the-fly augmentation capability refers to the dynamic creation or modification of data, model inputs, or computation during algorithm execution—specifically during training or inference—rather than relying on static, precomputed resources. This approach is prominent in machine learning, computer vision, natural language processing, time-series analysis, structured planning, and real-time systems. On-the-fly augmentation encompasses both data-level transformations (e.g., stochastic data perturbations per batch) and knowledge or computational enhancements (e.g., real-time retrieval or definition infusion), enabling models to adapt, generalize, or operate efficiently under constraints such as limited memory, data scarcity, or evolving sampling requirements.

1. Foundational Principles and Motivation

On-the-fly augmentation is motivated by the limitations of fixed, offline augmentation pipelines. Key drivers include:

Dynamic Sample Difficulty: The relevance or difficulty of a sample may change as the model adapts, making fixed augmentation suboptimal. Mechanisms such as loss-based difficulty assessment enable targeted augmentation, as in the MoDify framework’s Momentum Difficulty scheduling (Jiang et al., 2023).
Online Data Availability: In streaming or data-scarce applications, not all samples are present upfront; on-the-fly generation expands the training set in response to incoming data, as with online active learning and “Augmented Queues” (Malialis et al., 2022).
Resource and Storage Efficiency: For high-dimensional domains (e.g., 3D segmentation (Jain et al., 29 Sep 2025)), storing all augmented variants is infeasible; on-the-fly synthesis controls storage costs.
Real-Time Information Integration: Models may require context-dependent external knowledge or retrieval (e.g., episodic memory for LLMs (Wang et al., 2020), definition lookup (Munnangi et al., 2024)), which cannot be statically embedded.
Adaptivity to Model State: Adaptive augmentation based on current model gradients, parameter updates, or loss variances supports “flow channel” training, balancing under- and overfitting (as in SADA (Yang et al., 1 Oct 2025)).

2. Methodologies and Algorithmic Frameworks

On-the-fly augmentation encompasses a broad set of paradigms, distinguished by the target domain and transformation strategy.

2.1 Data-Level Transformation

Image Domain: Randomized operations such as RGB channel shuffling (6 permutations) are applied to each batch, with the probability of augmentation controlled by sample difficulty scores tracked in a global loss bank. For sample $x_i$ , the probability $p_i=1-d_i$ , where $d_i$ is the rank-based difficulty (Jiang et al., 2023).
Speech and Sequence Data: In ASR, dynamic time stretching is achieved by stochastic resampling in the feature space, sub-sequence sampling exploits token-level alignment, and SpecAugment applies masking in time/frequency dimensions. Aligned Data Augmentation (ADA) uses forced alignments to splice real speech segments, with token replacements selected via a masked LLM or at random (Lam et al., 2021, Nguyen et al., 2019).
Time-Series Forecasting: Each mini-batch is augmented by generating synthetic time series using STL decomposition and moving-block bootstrap (STL+MBB). For each true series in the batch, one synthetic variant is created via stochastic resampling of the remainder component, ensuring balanced real/synthetic representation (Cerqueira et al., 2024).
Structured Planning: In classical planning, macros (multi-action sequences) are computed on-the-fly within a localized state neighborhood, incrementally growing an action graph through “transitive” and “apply” operations (0810.1186).

2.2 Knowledge and Computational Augmentation

Information Retrieval: Pre-trained LLMs such as GPT-2 are dynamically augmented at inference by querying an external text corpus for context articles, concatenating retrieved content into the input. Episodic memory is continuously indexed and adapted as new documents arrive (Wang et al., 2020).
Hybrid Encoding Architectures: In retrieval-augmented question answering, LUMEN precomputes passage encodings offline and applies a lightweight “live encoder” conditioned on the current query at inference (“on-the-fly encoding”). Only a small fraction α of encoder layers are run live per instance, balancing compute cost and quality (Jong et al., 2023).
Definition Injection: For biomedical NER, LLMs are augmented by real-time lookup and insertion of human-curated UMLS definitions for candidate entities into the prompt, prompting the model to “revise” its extraction (Munnangi et al., 2024).

2.3 Adaptive and Sample-Aware Strategies

Dyamic Augmentation Strength: SADA estimates each sample’s influence on current parameter updates by projection of gradients and computes the variance over recent history. Stronger augmentation is assigned to stable/invariant samples, while volatile samples receive weaker perturbations (Yang et al., 1 Oct 2025).
Difficulty-Guided Scheduling: MoDify assigns augmentation probabilities inversely correlated with rank-based sample difficulty and only retains samples for gradient updates if their difficulty falls within calibrated thresholds ( $T_{easy}$ , $T_{hard}$ ) (Jiang et al., 2023).
On-the-Fly Denoising: In NLP, augmented data are denoised at training time by matching student model predictions to a teacher trained on clean data via soft-label distillation, complemented by a self-regularization loss across dropout runs (Fang et al., 2022).

3. Application Domains and Empirical Impact

On-the-fly augmentation is deployed across diverse high-impact ML and computational science applications.

Domain Generalization: MoDify, employing on-the-fly difficulty-aware augmentation with RGB Shuffle, achieved a +12.2 mIoU improvement (36.6%→48.8%) in semantic segmentation over Cityscapes and improved detection by +2.2 mAP over SOTA (Jiang et al., 2023).
Data Stream Classification: Augmented Queues provided substantial boosts in G-mean (e.g., 0.40→0.55 for MNIST-1% on a small memory, matching large-memory baselines) and accelerated convergence by 2–2.5×, without increased permanent memory (Malialis et al., 2022).
Speech Recognition: On-the-fly augmentations such as ADA (LM or random sampling) and SpecAugment combined to yield 9–23% WER reductions over baselines. Dynamic augmentations improved performance beyond static, offline methods (Lam et al., 2021, Nguyen et al., 2019).
Segmentation in Medical Imaging: Inserting GAN-generated synthetic tumors at batch time with GliGAN in nnU-Net delivered state-of-the-art lesion-wise Dice across six tumor classes, while reducing storage requirements by >95% compared to offline storage of augmented volumes (Jain et al., 29 Sep 2025).
Language Modeling: On-the-fly IR augmentation reduced GPT-2 perplexity by up to 15% on Gigaword (zero-shot), and retrieval-augmented event coreference improved F1 by +0.3 on both within- and cross-document settings (Wang et al., 2020, Jong et al., 2023).
Data-Sparse Forecasting: Online STL+MBB time-series augmentation led to consistent, statistically significant reductions in SMAPE across 6 of 8 datasets compared to both standard (no-augmentation) and offline augmented baselines (Cerqueira et al., 2024).
NER in Few-shot Biomedical Settings: Definition augmentation increased GPT-4 F1 by an average of +15% in zero-shot across six datasets, with similar boosts in open LLMs. Gains were strictly correlated with definition relevance (Munnangi et al., 2024).

4. Implementation Considerations and Computational Analysis

Several design patterns and trade-offs are observed in on-the-fly augmentation deployments:

Storage: Only real examples are stored persistently; synthetic (augmented) data are generated and discarded per batch, e.g., O(KM) vs. O(KMN) temporary storage in Augmented Queues (Malialis et al., 2022), and >95% storage reduction in 3D segmentation (Jain et al., 29 Sep 2025).
Compute Overhead: For most low-cost operations (RGB Shuffle, STL+MBB, time stretch), augmentation overhead is negligible relative to training time (<10%). High-dimensional GAN-based augmentations add ~10% wall-clock time per batch (Jain et al., 29 Sep 2025).
Online Hyperparameter Tuning: Key parameters include momentum (λ), window size (L), augmentation strength upper bounds ( $m_{max}$ ), and batchwise real/synthetic ratios. Control of these parameters is often empirically tuned per modality and task.
Plug-and-Play Modularity: Most methods are designed for minimal intrusiveness—easily wrapping or extending standard pipeline steps, requiring no policy models, auxiliary networks, or nontrivial cross-batch synchronization (Yang et al., 1 Oct 2025, Cerqueira et al., 2024).

5. Limitations, Challenges, and Future Directions

Despite their versatility, on-the-fly augmentation approaches exhibit several challenges:

Augmentation Quality Control: Random or aggressive augmentations can harm difficult or unstable samples. Adaptive strategies (e.g., SADA, MoDify) attempt to mitigate this by estimating difficulty or influence, but may require calibration when labels are noisy or distributions shift (Jiang et al., 2023, Yang et al., 1 Oct 2025).
Domain-Specific Constraints: Certain augmentations necessitate domain knowledge, such as preserving medical realism in 3D tumor synthesis (Jain et al., 29 Sep 2025) or aligning data and label semantics in ASR (Lam et al., 2021). Inadequate modeling can undermine performance.
Denoising and Label Reliability: Augmented data may introduce or amplify label noise, especially in language tasks. On-the-fly denoising via soft-label teacher matching addresses this but depends on the availability of a strong clean-data teacher (Fang et al., 2022).
Scalability with Model/Memory: Some retrieval-augmented frameworks (e.g., LUMEN) transfer computation expenses to offline storage, necessitating careful management for very large precomputed memory banks or when per-task memory recomputation is prohibitive (Jong et al., 2023).
Limited Generalization in Non-Stationary or Novel Domains: Static augmentation functions may not adapt to evolving domains; in online learning or streaming contexts, further research is needed for non-stationary-aware augmentation strategies (Malialis et al., 2022).
Interaction with Optimization Dynamics: Augmentation interacts with sample scheduling (e.g., in MoDify, SADA); inappropriate settings can increase convergence time or miss rare examples if not properly controlled (Jiang et al., 2023, Yang et al., 1 Oct 2025).

A prominent direction is reinforcement or feedback-driven augmentation, where the augmentation module dynamically steers transformations towards the model’s error-prone or under-represented regions, e.g., by integrating online difficulty mining or adversarial feedback between generators and main task networks (Jain et al., 29 Sep 2025).

6. Representative Methods: Comparison Table

Method	Domain	On-the-Fly Mechanism
MoDify	Vision (DG)	Probabilistic RGB-Shuffle by sample difficulty (Jiang et al., 2023)
Augmented Queues	Data Stream (CLF)	Augment in-memory queues at each AL step (Malialis et al., 2022)
ADA (ASR)	Speech Recognition	Token/audio aligned replacements, online LM calls (Lam et al., 2021)
SADA	Vision (General)	Sample-aware augmentation strength by loss variance (Yang et al., 1 Oct 2025)
OnDAT	Time-Series Forecasting	STL+MBB per mini-batch, balanced real/synthetic (Cerqueira et al., 2024)
LUMEN	Language Modelling/QA	Partial live encoding conditioned on input (Jong et al., 2023)
Biomedical Def.	Biomedical NER	Prompt-level knowledge from KB definitions (Munnangi et al., 2024)

Each method represents an instantiation of the on-the-fly augmentation paradigm, leveraging dynamic, context-adaptive transformations or knowledge injection to optimize model generalization, robustness, and resource usage.

References: (Jiang et al., 2023, Malialis et al., 2022, Wang et al., 2020, Lam et al., 2021, Yang et al., 1 Oct 2025, Cerqueira et al., 2024, Nguyen et al., 2019, 0810.1186, Fang et al., 2022, Jong et al., 2023, Jain et al., 29 Sep 2025, Munnangi et al., 2024)