HG-DAgger: Human-Gated Imitation Learning

Updated 8 January 2026

HG-DAgger is an interactive imitation learning algorithm that integrates human interventions with uncertainty-aware risk estimation in autonomous driving.
It employs a real-time gating mechanism and an ensemble-based uncertainty metric to selectively balance expert and novice actions during training.
Empirical evaluations show HG-DAgger reduces collision and road departure rates, offering improved safety and stability compared to BC and standard DAgger.

HG-DAgger is an interactive imitation learning algorithm specifically designed to accommodate human experts in real-world systems. It addresses inherent deficiencies in classical behavioral cloning (BC) and Dataset Aggregation (DAgger), especially in the context of high-stakes domains like autonomous driving, where compounding errors, safety, and human label quality are central concerns. HG-DAgger introduces a principled, human-gated control interface and uncertainty-aware risk estimation, yielding superior safety, sample efficiency, and human-likeness in learned policies (Kelly et al., 2018).

1. Background: Imitation Learning and DAgger

Imitation learning seeks a novice policy $\pi_N$ that closely mimics an expert $\pi_H$ , usually through supervised learning on expert demonstrations $\{(x,a_H = \pi_H(x))\}$ :

$\min_{\theta}\sum_{(x,a_H)\in\mathcal D_{\rm BC}} \|\pi_N^\theta(x) - a_H\|^2.$

Behavioral cloning trains $\pi_N$ solely on states encountered by $\pi_H$ , causing distributional shift— $\pi_N$ may experience unfamiliar states at test time, leading to compounding errors. DAgger counters this by stochastically interleaving expert and novice actions during data collection, with a Bernoulli gate (probability $\beta$ for expert), aggregating states from both policies and thereby reducing the mismatch.

However, traditional DAgger exposes two limitations in human-in-the-loop scenarios: (1) the expert must provide corrective actions in real time without full system control, which degrades safety and label quality; (2) actuator lag exacerbates issues in dynamic settings.

2. Algorithmic Structure of HG-DAgger

HG-DAgger modifies DAgger to ensure that the human expert can gate control at will, fully intervening in real time when the novice policy enters perceived unsafe regions. The components are:

Gating Rule: The human defines on-the-fly a permitted set of states $\mathcal P \subseteq \mathcal X$ ; the gating function operates as

$g(x_t) = \begin{cases} 1, & x_t \notin \mathcal P \ 0, & x_t \in \mathcal P \end{cases}$

where $g(x_t) = 1$ denotes expert control, $g(x_t) = 0$ denotes novice control.

Policy Rollout: The joint control policy during data gathering is

$\pi_i(x_t) = g(x_t)\pi_H(x_t) + [1-g(x_t)]\pi_{N_i}(o_t),$

where $o_t = \mathcal O(x_t)$ is the observation available to the novice.

Uncertainty (‘Doubt’) Metric: The novice $\pi_N$ is an ensemble of $M$ neural networks, each producing action $a_t^{(k)}$ for input $o_t$ . Empirical covariance is computed as

$C_t = \frac{1}{M-1}\sum_{k=1}^M (a_t^{(k)} - \bar a_t)(a_t^{(k)} - \bar a_t)^\top,\quad \bar a_t = \frac{1}{M}\sum_k a_t^{(k)}.$

The scalar doubt is defined as

$d_N(o_t) = \|\mathrm{diag}(C_t)\|_2,$

where higher $d_N$ indicates regions of epistemic uncertainty—states with a risk of poor novice performance.

Learning the Safety Threshold $\tau$ : Doubt values at human interventions are accumulated in a log $\mathcal I$ . After all data collection epochs, the safety threshold is set to the average of the top quartile of intervention-time doubts:

$\tau = \frac{1}{N/4} \sum_{i = \lfloor 0.75N \rfloor}^{N} \mathcal I[i].$

This threshold acts as a data-driven risk certificate for the trained novice.

3. Training Loop and Data Aggregation

HG-DAgger employs an iterative learning procedure integrating supervised updates, human gating, uncertainty logging, and risk threshold inference. The core loop is:

Start from an initial behavioral cloning dataset $\mathcal D_{\rm BC}$ , and initialize policy $\pi_{N_1}$ .
For each epoch $i$ $i$ over $K$ $K$ epochs and $M$ $M$ rollouts:
- At each timestep $t$ $t$ :
  - The gating function $g(x_t)$ determines whether the expert or novice acts.
  - When expert intervenes ( $g(x_t)=1$ ): execute expert action, augment dataset and log current doubt in $\mathcal I$ .
  - Otherwise, apply novice action.
- Retrain the novice on the enlarged dataset.
After training, compute $\tau$ from $\mathcal I$ .

This procedure preserves uninterrupted human control during interventions, capturing high-integrity expert labels and enabling the system to learn both safe state distributions and actionable risk certificates.

4. Novice Policy and Uncertainty Estimation Architecture

The novice $\pi_N$ is instantiated as an ensemble of feed-forward neural networks. Each subnetwork processes the concatenated observation vector

$o = [y, \theta, s, l_l, l_r, d_l, d_r]$

where the components represent lateral/heading/yaw, lane distances, and obstacle proximities. The architecture comprises two hidden layers (128–256 units, ReLU activation), with a continuous action output of (steering, speed). The ensemble mean prescribes the novice action; ensemble covariance estimates the epistemic uncertainty, closely approximating a scalable Gaussian process for risk estimation.

5. Experimental Setup and Evaluation Metrics

HG-DAgger was evaluated on both simulated and real-world autonomous driving tasks:

Simulation: Two-lane road with static obstacle cars; novice navigates randomly lane-blocked sequences.
Real vehicle: MG-GS car equipped with LiDAR, precision localization, onboard safety driver, and an off-board human expert for HG-DAgger interventions.

Baselines included:

Behavioral Cloning (BC) with $N_0=10^4$ demonstration labels.
Standard DAgger with $\beta_0=0.85$ , decayed by $0.85$ per epoch.

Metrics:

Collision rate (collisions per meter traveled)
Road departure rate and mean duration
Steering-angle distribution this relative to human reference (Bhattacharyya distance)

6. Empirical Results and Analysis

In simulation, after a total of $2 \times 10^4$ sampled states (BC + DAgger/HG-DAgger), the following rates were observed:

Method	Road Departure (m⁻¹)	Collision (m⁻¹)	Bhattacharyya Distance
BC	$\approx 1.2\times10^{-3}$	$\approx 0.9\times10^{-3}$	0.1173
DAgger	$\approx 1.5\times10^{-3}$	$\approx 1.2\times10^{-3}$	0.1057
HG-DAgger	$\approx 0.5\times10^{-3}$	$\approx 0.3\times10^{-3}$	0.0834

DAgger suffered from late-epoch instability, plausibly due to degraded label quality under stochastic action gating, whereas HG-DAgger achieved more stable learning. Risk-threshold validation—partitioning states into estimated safe $\hat{\mathcal P} = \{x: d_N(o) \leq \tau\}$ and unsafe regions—demonstrated that $\tau$ robustly separates high- and low-risk outcomes: inside $\hat{\mathcal P}$ , collision/road-departure rates were $0.607\times 10^{-3}$ ; outside, rates rose to $7.53\times 10^{-3}$ and $12.09\times 10^{-3}$ per meter, respectively.

On-vehicle, HG-DAgger achieved zero collisions and zero departures, outperforming both BC and DAgger.

7. Discussion, Limitations, and Prospective Directions

HG-DAgger presents a practical solution for real-world imitation learning with human experts. By preserving the expert’s ability to intervene at will, the algorithm assures higher-quality expert demonstrations and safer data collection. The risk threshold $\tau$ , learned directly from intervention-time uncertainty, operates as an actionable certificate for downstream safety filtering.

Limitations include reliance on the expert’s real-time discrimination of unsafe states and absence of formal regret or safety proofs. The ensemble-based uncertainty metric, while tractable, may undersample epistemic uncertainty. Prospective directions include automating the gating process using the learned $d_N(o) > \tau$ rule, employing richer uncertainty estimators such as Bayesian neural networks or MC-dropout, and extending the methodology to multi-modal or formally verifiable safety-critical domains (Kelly et al., 2018).

Markdown Upgrade to Chat

References (1)

HG-DAgger: Interactive Imitation Learning with Human Experts (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HG-DAgger.