Papers
Topics
Authors
Recent
Search
2000 character limit reached

LazyDAgger: Efficient Imitation Learning

Updated 2 February 2026
  • LazyDAgger is an imitation learning algorithm that minimizes expert interventions by selectively querying states with high uncertainty.
  • It utilizes dropout-based model ensembles to quantify uncertainty, allowing only the most ambiguous states to trigger expert labeling.
  • Variants further optimize human-in-the-loop control by reducing context switches through asymmetric thresholds and noise injection during supervision.

LazyDAgger is an imitation learning (IL) algorithm, designed to efficiently reduce expert intervention queries and minimize supervisor burden during interactive training of autonomous policies. Two principal algorithmic streams carry the LazyDAgger label: one based on disagreement-driven querying for expert labeling (Haridas et al., 2023), and another focused on reducing supervisor context-switching overhead via optimized human-in-the-loop control delegation (Hoque et al., 2021).

1. Overview and Motivation

Standard Dataset Aggregation (DAgger) algorithms require that the expert agent provides corrective actions at every state encountered during policy rollouts, resulting in substantial supervisory cost. LazyDAgger (often referred to as DADAgger in some literature) addresses this issue via selective querying: only acquiring expert actions on states where the learned policy exhibits high uncertainty. Independently, an interactive robot learning variant of LazyDAgger focuses on reducing human supervisor context switches, improving both efficiency and overall task success by lengthening intervention segments and injecting noise during expert control. Both approaches aim to conserve expert effort without sacrificing imitation quality, addressing scalability challenges in real-world or costly environments (Haridas et al., 2023, Hoque et al., 2021).

2. Core Algorithms

2.1 Disagreement-Augmented Dataset Aggregation (DADAgger / LazyDAgger)

LazyDAgger in the DADAgger formulation utilizes dropout-based model ensembles at policy evaluation time to quantify epistemic uncertainty. At each visited state ss during policy rollout, MM stochastic forward passes are computed under random dropout, producing actions {y^i,1(s),…,y^i,M(s)}\{\hat y_{i,1}(s),\dots,\hat y_{i,M}(s)\}. The mean action and empirical variance (uncertainty U(s)U(s)) are

yˉi(s)=1M∑m=1My^i,m(s),U(s)=1M∑m=1M∥y^i,m(s)−yˉi(s)∥2.\bar y_i(s) = \frac{1}{M} \sum_{m=1}^M \hat y_{i,m}(s),\qquad U(s) = \frac{1}{M} \sum_{m=1}^M \|\hat y_{i,m}(s) - \bar y_i(s)\|^2.

Only the top α%\alpha\% of states by U(s)U(s) are selected for expert querying. Dataset aggregation and policy re-training then proceed as in DAgger, but over this reduced subset (Haridas et al., 2023).

2.2 LazyDAgger for Reducing Context Switching

The interactive robot learning variant modifies SafeDAgger by introducing two mechanisms:

  • Asymmetric switching thresholds: Two discrepancy cutoffs, Ï„sup\tau_\textrm{sup} and lower Ï„auto\tau_\textrm{auto}, form a hysteresis band for supervisor/autonomous delegation, reducing frequent context switching.
  • Noise injection during supervisor control: When the supervisor acts, actions are drawn from N(π∗(s),σ2I)\mathcal{N}(\pi^*(s),\sigma^2 I) rather than strictly π∗(s)\pi^*(s), broadening the state distribution to counteract covariate shift.

A discrepancy classifier f(s)∈[0,1]f(s) \in [0,1] predicts the need for intervention. The policy is updated with data acquired predominantly from supervisor-controlled states or when entering/exiting supervisor mode, producing longer, less frequent segments of supervision (Hoque et al., 2021).

3. Theoretical Guarantees and Limitations

No new regret or query-efficiency bounds for LazyDAgger appear in the primary sources. In the DADAgger setting, the standard DAgger no-regret guarantee is recovered when the query fraction α=100%\alpha = 100\%. For smaller α\alpha, performance degrades in proportion to the unqueried OOD states, but this trade-off is not captured by a formal upper bound in the extant literature (Haridas et al., 2023). In the context-switching variant, formal complexity is described as O(N⋅T⋅C)O(N\cdot T\cdot C) per epoch (where NN is epochs, TT horizon, CC cost per forward pass), with no increase relative to SafeDAgger (Hoque et al., 2021).

A plausible implication is that as the fraction of states selected for querying decreases, the learner may receive a less representative distribution of corrective actions, increasing the risk of accumulating blind spots unless uncertainty estimation is accurate and well-calibrated. Similarly, the effectiveness of reducing context switches relies on accurate estimation and tuning of the hysteresis thresholds and noise parameters.

4. Empirical Evaluation

4.1 DADAgger-style LazyDAgger

Empirical studies in Car Racing and HalfCheetah domains report that LazyDAgger achieves 95–99% of DAgger’s cumulative reward while reducing expert queries by 40–60%. In comparison, random sampling at α%\alpha\% typically requires more queries to attain the same performance level, underscoring the importance of uncertainty-driven selection. No detailed experiment tables or plots are provided, but trends indicate that increasing the ensemble size MM sharpens the uncertainty signal at proportional compute cost (Haridas et al., 2023).

4.2 Context-Switch Reduction LazyDAgger

In simulated MuJoCo tasks (HalfCheetah-v2, Walker2D-v2, Ant-v2) with TD3 supervisors, LazyDAgger reduces context switches by 46–79% over SafeDAgger, maintains or improves test-time reward, and matches supervisor action counts. In a physical ABB YuMi robot fabric manipulation task, LazyDAgger reduced context switches by 60% and scored 8/10 task successes (vs. SafeDAgger’s 2/10 and Behavioral Cloning’s 0/10), with only modest growth in supervisor actions reflecting longer intervention segments (Hoque et al., 2021).

Task Switch Reduction Success Rate Supervisor Actions (LazyDAgger/SafeDAgger)
HalfCheetah 79% ≥ SafeD ≈
Walker2D 56% ≥ SafeD ≈
Ant 46% ≥ SafeD ≈
YuMi Robot 60% 8/10 vs 2/10 43 vs 34

5. Practical Considerations and Extensions

For DADAgger, the number of dropout ensemble members MM (10–20 typical) balances uncertainty estimation reliability with compute cost. The query threshold can be specified as a percentile (α\alpha) for ease. Computing per-step uncertainty in rollouts scales as M⋅TM \cdot T forward passes per episode. Extensions include employing alternative uncertainty measures (e.g., entropy), adaptive adjustment of α\alpha, or adaptation to classification/discrete actions (Haridas et al., 2023).

Noise variance σ2\sigma^2 and the hysteresis thresholds (τsup,τauto)(\tau_{\textrm{sup}}, \tau_{\textrm{auto}}) must be tuned in the context-switching variant, with practical benefits most compelling when context-switch latency LL is substantial (e.g., human operation). Potential future work includes online adaptation of thresholds, noise distribution learning, or extensions to multi-agent/fleet settings (Hoque et al., 2021).

6. Implications and Limitations

LazyDAgger provides a direct approach to reducing sample and supervisory costs in imitation learning by focusing effort on policy uncertainty or targeted, longer interventions. Its primary empirical benefit is substantially fewer expert queries or context switches while preserving learning efficacy. However, effectiveness hinges on accurate uncertainty estimation (in DADAgger) or well-tuned switching and noise parameters (in context-switching LazyDAgger). The need for hyperparameter selection, especially in non-simulated or complex environments, remains a challenge. No formal theoretical sample complexity or regret improvements are currently established for these variants. This suggests further work on formalizing performance guarantees and automating practical parameter selection could be significant for broader adoption.

LazyDAgger generalizes and extends standard DAgger [Ross et al., 2011] by introducing selective querying, and modifies SafeDAgger [Zhang et al., 2017] with asymmetric thresholds and noise injection. It is distinguished from random sample selection by leveraging model uncertainty or discrepancy classifiers to focus queries/interventions where they are most impactful. Possible future research directions include blending active learning criteria, adapting methodologies for discrete action spaces, and integrating feedback over episodic horizons.


For comprehensive technical details, see (Haridas et al., 2023) for the DADAgger/LazyDAgger method and (Hoque et al., 2021) for context-switch-minimizing LazyDAgger in interactive robotic imitation learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LazyDAgger.