Adaptive Sampling & Domain Randomization

Updated 18 August 2025

Adaptive sampling and domain randomization are techniques that enhance learning by actively selecting informative samples and systematically varying simulation parameters.
They improve sample efficiency, robustness, and transferability in applications such as robotics, vision, and control by targeting model weaknesses.
These methods enable efficient sim-to-real transfer and generalization in uncertain environments, reducing reliance on large, uniform data sets.

Adaptive sampling and domain randomization are foundational methodologies for robust machine learning, control, and perception in settings where real-world data is scarce, the operational environment is uncertain, or the underlying domain is only partially known. Adaptive sampling refers to procedural selection of data points or simulation scenarios that provide maximal incremental benefit to learning or estimation, often based on the current model’s weaknesses or uncertainties. Domain randomization denotes systematic perturbation of environmental or simulation parameters during training, typically to achieve robustness to out-of-distribution samples, transfer to new domains (“sim-to-real”), or strong generalization across unforeseen settings. Both strategies have independently and jointly enabled substantial advances in robotics, vision, control, reinforcement learning, and surrogate-based uncertainty quantification, by bridging the gap between controlled simulations and unpredictable real-world phenomena.

1. Fundamental Concepts and Historical Context

Domain randomization emerged as a practical answer to the “reality gap” in robotics and computer vision. Instead of chasing maximal simulation fidelity, the approach submerges the learner in wide distributions of possible states, visuals, or physical parameters, rendering the “real world” another instance in an ocean of simulation variations (Tobin et al., 2017). Adaptive sampling, by contrast, builds on the recognition that not all training points are equally valuable: sample efficiency, stability, and robustness are improved by actively prioritizing informative or challenging cases—whether in synthetic parameter space (Khirodkar et al., 2018, Mehta et al., 2019, Ramos et al., 2019), spatial feature space (Huang et al., 2021), or over high-uncertainty domains (Danesh et al., 8 Jul 2025).

Early domain randomization methods used blind uniform sampling over configurable simulation parameters (Tobin et al., 2017), but this approach, while yielding robustness at scale, can be inefficient, requiring large sample sets to sufficiently cover rare, adversarial, or otherwise informative regions. Adaptive sampling and adversarial variants of domain randomization (ADR) address this inefficiency by integrating closed-loop policies or statistical inference to steer the underlying data generation or simulator randomization toward those samples which maximize error correction, uncertainty reduction, or general policy improvement (Khirodkar et al., 2018, Mehta et al., 2019, Mozifian et al., 2019).

2. Algorithms and Mathematical Formulations

Uniform Domain Randomization (UDR)

The canonical approach trains on data sampled independently from a uniform prior over simulation parameters (lighting, textures, dynamics coefficients, etc.) (Tobin et al., 2017). The underlying optimization is typically empirical risk minimization, with loss

$L = \sum_{i} \| d(I_0)_i - (x_i, y_i, z_i) \|_2^2$

for vision-based regression, or cumulative RL returns averaged over the randomization distribution for control.

Adaptive / Adversarial Domain Randomization (ADR, Active DR)

ADR replaces the fixed sampling strategy with a learnable policy over parameters. The policy $\pi(\cdot)$ is trained to generate simulation parameters $\xi$ that are maximally challenging—e.g., maximize the negative of the learner’s loss—formally:

$J(\theta) = \mathbb{E}_{s \sim \pi_\theta}\left[ -L(f(s), y) \right]$

where $L$ is the learner’s loss and $f$ its current model (Khirodkar et al., 2018). In Active Domain Randomization (Mehta et al., 2019), an RL agent samples environment parameters $\xi$ , evaluates their informativeness via a discriminator $D_\psi$ , and updates its sampling policy $\mu_\phi$ using variants of Stein Variational Policy Gradient:

$\mu_{\phi_i} \leftarrow \mu_{\phi_i} + \frac{\epsilon}{N} \sum_j \left[ \nabla_{\mu_{\phi_j}}J(\mu_{\phi_j}) k(\mu_{\phi_j}, \mu_{\phi_i}) + \alpha \nabla_{\mu_{\phi_j}} k(\mu_{\phi_j}, \mu_{\phi_i}) \right]$

Entropy Maximization and Constrained Optimization

Methods such as DORAEMON (Tiboni et al., 2023) formalize domain randomization as a constrained entropy maximization problem:

$\begin{align*} \max_\phi \quad & \mathcal{H}(\nu_\phi) = -\mathbb{E}_{\xi \sim \nu_\phi}[\log \nu_\phi(\xi)] \ \text{s.t.} \quad & \mathcal{G}(\theta, \phi) \geq \alpha \end{align*}$

where $\nu_\phi$ is the parameter distribution and $\mathcal{G}(\theta, \phi)$ the (estimated) policy success rate under $\nu_\phi$ . Domain randomization distributions are widened adaptively up to the point where policy performance remains acceptable (success rate threshold $\alpha$ ); if performance drops, a backtracking step resets the distribution to maintain training stability (Tiboni et al., 2023).

Bayesian and Gradient-based Distribution Learning

Frameworks such as BayesSim (Ramos et al., 2019), BayRn (Muratore et al., 2020), and LSDR (Mozifian et al., 2019) infer parameter posteriors or optimize randomization distribution parameters through Bayesian inference or direct gradients. BayesSim learns a mixture density network $q_\phi(\theta|x)$ and forms the approximate posterior:

$\hat{p}(\theta|x^r) \propto \frac{p(\theta)}{\tilde{p}(\theta)}q_\phi(\theta|x^r)$

for downstream training. Gradient updates for adaptive domain randomization generally take the form:

$\nabla_\phi \approx \frac{1}{K}\sum [J_{M,z_i}(\pi) \nabla_\phi \log p_\phi(z_i)] - \alpha \nabla_\phi D_{KL}(p(z)||p_\phi(z))$

where $J_{M,z_i}(\pi)$ is the policy performance in sampled environment $z_i$ (Mozifian et al., 2019).

3. Practical Application Domains

Robotics and Control

Domain randomization is widely used to train robotic perception and control policies that generalize to real-world hardware. Vision-based object localization networks trained exclusively on synthetic images with randomized parameters can achieve sub-centimeter accuracy and are robust to distractors and occlusions (Tobin et al., 2017). In manipulation and locomotion, adaptive domain randomization (via Bayesian optimization, adversarial policies, or entropy maximization) enables robust zero-shot and few-shot sim-to-real transfer for robotic arms, quadrupedal locomotion, and ball-in-cup tasks (Ramos et al., 2019, Muratore et al., 2020, Tiboni et al., 2023). Fine-tuning strategies (e.g., as in BayRnTune (Huang et al., 2023)) further accelerate policy adaptation to new or perturbed environments, often yielding performance within the range of “oracle” policies with substantially less data.

Vision and Semantics

Frequency-space domain randomization (FSDR) shifts augmentation to frequency components: only domain-variant features (DVFs) are randomized, while domain-invariant features (DIFs) encoding semantic structure are preserved, yielding improved domain generalization and minimal loss of important scene details (Huang et al., 2021). Instance-driven mixed sampling and guidance training methods further refine data-centric adaptation in segmentation and action detection (Lu et al., 2022, Zhou et al., 22 Mar 2024).

Computational Science and UQ

Adaptive sampling strategies are foundational in surrogate modeling and uncertainty quantification (UQ), especially in the context of function approximation on irregular or unknown domains (Adcock et al., 2022). Algorithms such as ASGD and ASUD use Christoffel-function-based measures and domain learning strategies to minimize wasted samples, guaranteeing sample complexity of $\mathcal{O}(N\log N)$ and robust error bounds even when the active domain is only partially identified.

4. Performance, Robustness, and Sample Efficiency

Adaptive domain randomization and sampling consistently improve both sample efficiency and generalization. ADR and Active-DR require fewer synthetic samples to achieve comparable or better generalization than uniform DR by targeting regions of simulation where the learner is weak (e.g., truncated or occluded objects in detection tasks), and result in more rapid robustness improvements (Khirodkar et al., 2018, Mehta et al., 2019). Bayesian domain randomization with closed-loop adaptation to real-world returns outperforms baselines in both transferability and variance while reducing prior knowledge reliance (Muratore et al., 2020). In the benchmark problem of LQR, it is rigorously shown that domain randomization with the correct sampling distribution achieves the optimal asymptotic rate of excess cost decay—matching certainty equivalence—whereas robust control is superior in the low-data regime but generally more conservative (Fujinami et al., 17 Feb 2025). On industrial vision tasks, real-time adaptive pipelines (e.g., BlendTorch (Heindl et al., 2020)) both increase mean detection precision and decrease run-to-run variance compared to fixed synthetic or real datasets.

An illustrative summary table (performance domain):

Method	Domain	Sample Selection	Data Efficiency	Generalization
Uniform DR	Robotics/Vision	Uniform random	Low	Variable
ADR / Active-DR	RL, Control	Adversarial/informative	High	High
Bayesian/Gradient DR	Robotics	Posterior-guided/BO	High	High
FSDR	Vision	Frequency-adaptive	Medium	High
ASUD (Unknown Domains)	UQ/Sci Comp	Domain-adaptive	High	High

5. Trade-Offs, Limitations, and Open Research Directions

The trade-offs in adaptive sampling and domain randomization are nuanced:

Sample Efficiency vs. Coverage: Adaptive approaches target challenging cases for rapid improvement but risk undercoverage of rare but important “corner cases” if the adaptation process is miscalibrated or greedy (Mehta et al., 2019, Khirodkar et al., 2018).
Conservatism vs. Exploration: Robust control prioritizes safety but may underperform on average; unconstrained entropy maximization risks “overshooting” into intractably difficult settings, whereas constrained variants like DORAEMON (Tiboni et al., 2023) manage this by enforcing a success rate threshold.
Computational Overheads: Bayesian optimization and adversarial RL policies add overhead via meta-optimization or large-scale parallel simulation (Ramos et al., 2019, Antonova et al., 2021), but in GPU-accelerated environments this becomes tractable.
Scalability and High-Dimensionality: Gaussian Process models in Bayesian DR can face scaling difficulties in high-dimensional parameter spaces, and open-loop simulation replay can diverge if behavior is poorly synchronized with real data (Tiboni et al., 2022).

Open questions highlighted in recent work include: theoretical tightening of burn-in rates and convergence, extension of gradient-based DR optimization to nonlinear or misspecified models (Fujinami et al., 17 Feb 2025), development of more data- and compute-efficient posterior inference over large parameter spaces, robust multi-factor and multi-modal domain identification, and integration of adaptive domain randomization into continual or lifelong learning contexts. The design of informative reward/uncertainty signals for identifying “hard” samples or domains also remains a fertile area for investigation (Mehta et al., 2019, Danesh et al., 8 Jul 2025).

6. Safety, Verification, and Deployment

Domain randomization and adaptive sampling alone do not guarantee safety during sim-to-real transfer. The introduction of uncertainty-aware RL frameworks, exemplified by ensemble critic variance for OOD detection and adaptive policy gating, is a significant step toward practical, certifiable deployment (Danesh et al., 8 Jul 2025). By requiring ensemble agreement before real-world deployment and refocusing training on high-uncertainty state-action regions, these frameworks reduce catastrophic failures resulting from unmodeled environment shifts or simulator misalignment. However, representative target domain datasets and control over simultaneous multi-parameter randomization remain current challenges.

7. Broader Impacts and Future Perspectives

Adaptive sampling and domain randomization have catalyzed widespread progress in fields requiring robust generalization from limited or synthetic data, enabling transfer from simulation to diverse, uncertain, or costly domains across robotics, autonomous driving, computational science, and vision-based perception. Their future development will likely focus on the unification of active learning, probabilistic inference, and task-driven regularization, as well as extension to multi-agent, non-stationary, or evolving domain settings. Methodologies that couple principled uncertainty quantification with efficient, adaptive data generation and explicit success thresholds are poised to form the backbone of future safe, scalable, and trustworthy real-world autonomy.