Active Sampling with Exploration (ASE)

Updated 20 April 2026

Active Sampling with Exploration (ASE) is a framework that adaptively selects queries by combining uncertainty metrics with exploration strategies to maximize learning efficiency.
It employs acquisition functions based on measures like entropy, Fisher information, and sensitivity gradients to balance exploiting known data with sampling unexplored regions.
ASE has demonstrated practical gains across active learning, reinforcement learning, and robotics, reducing labeling costs and improving system identification with robust theoretical guarantees.

Active Sampling with Exploration (ASE) refers to a family of methodologies across machine learning, reinforcement learning, Monte Carlo inference, and simulation-based modeling that address the core challenge of balancing the exploitation of currently informative or uncertain regions of an input or parameter space with explicit exploration of regions that remain underrepresented or unknown. The central tenet of ASE is to adaptively and actively choose queries, samples, or interventions to maximize the rate of learning, generalization, or estimation efficiency, as opposed to passively collecting data or relying on fixed or uniform sampling strategies. Contemporary ASE frameworks span uncertainty-driven querying in active learning, reinforcement learning, importance sampling, robotics, simulation-based system identification, and neural rendering, each contributing domain-specific acquisition functions, algorithmic mechanisms, and theoretical analyses.

1. Fundamental Principles and General Mathematical Frameworks

ASE operationalizes exploration–exploitation tradeoffs by defining acquisition functions or reward criteria that combine (i) a measure of "informativeness" or "uncertainty," and (ii) a structural component for encouraging novel, diverse, or spatially dispersed sampling. These acquisition functions may be constructed from epistemic uncertainty, loss or sensitivity gradients, Fisher information, or information-theoretic metrics such as entropy or information gain.

For instance, in active learning with evidential models, the "Klir uncertainty" score

$\mathcal{U}_m(x) = \lambda N(m_x) + (1-\lambda) D(m_x)$

balances non-specificity $N$ (exploration) and discord $D$ (exploitation) via an explicit trade-off parameter $\lambda$ (Hoarau et al., 2023). In adaptive importance sampling, the proposal mixture weights are adaptively re-weighted using an optimism penalty

$q_{a,t} = \frac{\hat Z_{a,t-1}+\sigma_{a,t-1}}{\sum_{b} (\hat Z_{b,t-1}+\sigma_{b,t-1})}$

with an uncertainty bonus $\sigma_{a,t}$ that shrinks with more samples in cell $a$ (Lu et al., 2018). In reinforcement learning, reward functions may incorporate both expected improvement (exploitation) and state-action novelty, coverage, or information gain (exploration) (Somayazulu et al., 2024, Zheng et al., 31 Oct 2025).

ASE methodologies commonly utilize multi-armed bandit principles, variational Bayesian formulations, MCMC or mirror ascent optimization, and, in deep learning, ensemble posterior estimation or Thompson sampling surrogates. Across settings, the ASE paradigm seeks explicit control over the trade-off between exploitation and global coverage, with domain-specific theoretical guarantees and empirical validation on coverage, convergence speed, or estimation efficiency.

2. ASE Algorithms in Active Learning and Uncertainty Sampling

ASE is foundational in modern active learning frameworks where the goal is to label the most informative or underrepresented points in a pool or stream:

Klir Uncertainty / Evidential Active Sampling: Uses a combination of non-specificity (Hartley-like, reflecting imprecision among possible classes) and discord (generalized Shannon entropy, measuring conflict) in Dempster–Shafer mass functions. The tunable $\lambda$ in

$\mathcal{U}_m(x) = \lambda \sum_{A \subseteq \Omega} m_x(A) \log_2 |A| +(1-\lambda)\left[-\sum_{A \subseteq \Omega} m_x(A) \log_2\mathrm{BetP}(A)\right]$

directly interpolates between pure exploration and pure exploitation (Hoarau et al., 2023). Empirically, setting $\lambda\in[0.2,0.4]$ yields best performance across diverse datasets.

Graph-based ASE (PWLL- $N$ 0): In graph-based semi-supervised settings, uncertainty is given by the minimum-norm of the Poisson Reweighted Laplacian solution, with an explicit $N$ 1-regularization to force exponential decay of influence away from labeled nodes. ASE is realized by querying points with smallest $N$ 2, thus systematically covering new clusters before focusing on decision boundaries (Miller et al., 2022). Rigorous PDE analysis proves that all clusters are visited before exploitation dominates.
Deep Ensembles and Thompson Sampling: Pool-based active learning for computer vision leverages deep ensembles (approximate Bayesian posterior) and either variation ratio or ensemble diversity as an uncertainty measure. ASE is achieved by selecting points that maximize ensemble disagreement, and posterior-sampling (i.e., approximate Thompson sampling) integrates geometric and uncertainty-based exploration (Mohamadi et al., 2022).

3. ASE in Simulation, System Identification, and Robotics

ASE has been successfully deployed for efficient model and parameter identification in both simulated and real-world robotic systems:

Uncertainty-driven Exploration for Dynamics Learning (SoftAE): In soft robots, ASE guides the agent to maximize epistemic uncertainty as estimated by an ensemble of neural transition models, explicitly planning control sequences that drive the system into poorly understood regions of the state–action space. The acquisition is

$N$ 3

where $N$ 4 is the variance of model ensemble predictions (Zheng et al., 31 Oct 2025). Planning uses cross-entropy method (CEM) or MPC under an optimistic transition model, imitating "deep exploration." The outcome is near-uniform coverage, improved zero-shot generalization, and substantial gains in dynamics prediction accuracy.

Active System Identification for Sim2Real Transfer (SPI-Active): For legged robots, ASE is instantiated via a two-stage pipeline: initial sampling-based parameter identification with massive simulator rollouts, followed by policy-driven active exploration that optimizes the Fisher Information Matrix (FIM) of the system parameters. The agent actively designs command sequences that maximize the expected trace or log-determinant of the FIM, ensuring that the collected trajectories are maximally informative about sensitive or poorly observed parameters (Sobanbabu et al., 20 May 2025). This leads to demonstrable improvements in policy transfer and open-loop prediction.
Acoustic Environment Modeling via Information-Gain Exploration: In the ActiveRIR framework, ASE drives a mobile agent to balance exploration (area/novelty coverage) with exploitation (sampling at high information-gain poses) to efficiently map room impulse responses. The reward function is

$N$ 5

where $N$ 6 is reduction in global acoustic model error, and $N$ 7, $N$ 8 reward covering unseen regions and pose novelty (Somayazulu et al., 2024). Ablation confirms the necessity of all terms for optimal learning.

4. ASE Approaches in Monte Carlo and Bayesian Inference

ASE methods have motivated new algorithms in adaptive importance sampling, amortized probabilistic inference, and generative modeling:

Partition-based and Hierarchical Adaptive Importance Sampling (Daisee, HiDaisee): By partitioning the proposal space and using optimism-driven reweighting, Daisee adaptively focuses computation on both high-mass and under-sampled regions:

$N$ 9

with $D$ 0 an uncertainty bonus scaling as $D$ 1. Regret bounds of $D$ 2 are proven (Lu et al., 2018). A hierarchical extension (HiDaisee) dynamically increases resolution in complex regions.

Trajectory Exploration in Generative Flow Networks (GFlowNets): Here, ASE corresponds to posterior-sampling over model parameters (Thompson Sampling) to sample trajectories for training, using bootstrap-ensemble approximations of the posterior. Empirical metrics of convergence (e.g., $D$ 3 error to target, number of discovered modes) show substantial acceleration relative to prior methods (Rector-Brooks et al., 2023).

5. ASE in Reinforcement Learning, Bandits, and Markov Decision Processes

ASE methodology has been foundational in exploration under bandit and MDP frameworks, with explicit algorithmic and theoretical constructs:

Mirror Ascent in Fixed-Confidence Bandits: The sampling rule is derived by online mirror ascent on a dual objective encoding the hardness of distinguishing alternative models, regularized via entropy and with forced exploration,

$D$ 4

(Ménard, 2019). This achieves asymptotic optimality in expected sample complexity.

Active Exploration in MDPs (FW-AME): ASE for MDPs optimizes a mean-square error criterion for state means under unknown heteroscedastic noise, alternating optimistic linearizations (Frank–Wolfe) and policy deployment. Regret scales as $D$ 5 in the number of steps, with extra cost proportional to the inverse spectral gap of the MDP (Tarbouriech et al., 2019).

6. ASE for Unbounded and Structured Input Spaces

ASE techniques extend naturally to problems where the feasible set or relevant data domain is not fixed in advance:

Active Expansion Sampling (AES): Targets identification of feasible domains over unbounded input spaces, alternating exploitation (local boundary refinement with GP models) and exploration (systematic outward expansion) phases. An explicit misclassification-loss guarantee is maintained within the explored region, regardless of the query budget (Chen et al., 2017). Runtime and coverage outperform fixed-bound competitors, especially in the presence of multiple disconnected feasible "islands."

7. ASE for Neural Rendering and High-Dimensional Synthesis

ASE is central to recent advances in neural global illumination and photorealistic rendering:

Active Exploration via MCMC for Scene Sampling: In variable-scene neural rendering, ASE is implemented through a mix of global (Uniform) and local (Gaussian) MCMC proposals in the scene parameter space $D$ 6, with an informativeness function combining training loss and gradient-norm:

$D$ 7

and a greedy Metropolis–Hastings acceptance policy ensuring rapid adaptation to hard-to-learn regions (Diolatzis et al., 2022). A self-tuning sample reuse mechanism and progressive resolution enhancement further accelerate convergence.

Application Domain	Acquisition/Exploration Principle	Core Reference
Pool-based active learning	Klir uncertainty (N, D terms), ensemble posterior	(Hoarau et al., 2023, Mohamadi et al., 2022)
Graph/semi-supervised active learning	Minimum-norm of Laplacian, $D$ 8-exponential decay	(Miller et al., 2022)
Simulation-based SysID/robotics	Fisher information maximization in policy space	(Sobanbabu et al., 20 May 2025, Zheng et al., 31 Oct 2025)
Neural rendering	Loss-gradient-norm in variable scene MCMC	(Diolatzis et al., 2022)
Adaptive importance sampling/inference	Optimism bonus in proposal partition	(Lu et al., 2018)
GFlowNets/trajectory learning	Thompson sampling over ensemble policy posteriors	(Rector-Brooks et al., 2023)
RL/bandits/MDPs	Mirror ascent, FW surrogates, mixing gap	(Ménard, 2019, Tarbouriech et al., 2019)
Unbounded domain search	Alternating GP-exploitation and expansion	(Chen et al., 2017)

8. Impact, Domain-Specific Gains, and Practical Outcomes

ASE has led to substantial gains in sample efficiency, generalization, and computational tractability across many domains:

In scientific and engineering applications (robotics, simulation-to-real transfer), ASE yields 42–63% performance gains and more accurate system identification (Sobanbabu et al., 20 May 2025).
In active learning, ASE reduces labeling costs by up to 82%, consistently outperforms classic uncertainty sampling and baseline acquisition heuristics, and achieves near-supervised performance at a fraction of the labeled fraction (Mohamadi et al., 2022, Hoarau et al., 2023).
In global illumination and synthetic data generation, ASE reduces mean absolute percentage error (MAPE) by 2–5× relative to uniform sampling, and captures rare but visually significant caustics and specular effects (Diolatzis et al., 2022).
In adaptive importance sampling, ASE achieves sublinear cumulative pseudo-regret and smooth adaptation to multi-modal or high-variance target densities (Lu et al., 2018).

9. Directions, Theoretical Guarantees, and Tuning

ASE algorithms often admit explicit trade-off control (e.g., $D$ 9 in Klir uncertainty, $\lambda$ 0 in Laplacian regularization, ensemble size and noise in deep ensembles, mixing rate in mirror ascent). Theoretical guarantees include:

Regret or convergence bounds (e.g., $\lambda$ 1 for Daisee, $\lambda$ 2 MSE contraction in MDPs, exponential bounds on exploration coverage in graph-based semi-supervised learning) (Lu et al., 2018, Tarbouriech et al., 2019, Miller et al., 2022).
Sample-complexity quantification and empirical confirmation of exploration coverage, cluster discovery, and robust performance under label or parameter noise.
Empirical ablation studies consistently validate the benefit of explicit exploration terms, ensemble-based uncertainty, and adaptive mixture strategies.

Optimization of ASE hyperparameters is domain- and dataset-dependent; cross-validation or held-out tests are typically used to select tradeoff parameters for optimal exploration–exploitation balance. Notably, excessive exploration ( $\lambda$ 3 or large optimism bonuses) reduces exploitation accuracy, while pure exploitation is prone to myopic local sampling.

ASE stands as a unifying paradigm for adaptive, uncertainty- and diversity-driven sampling and exploration strategies, with demonstrable impact across the spectrum of data-driven modeling, control, simulation, inference, and synthesis. Its theoretical foundations and robust domain-specific implementations cement its role in advancing sample-efficient, generalizable AI systems.