Uncertainty-Aware Action Labeling

Updated 9 February 2026

The paper introduces uncertainty-aware action labeling methods that integrate uncertainty quantification into both adaptive labeling and weakly supervised detection pipelines.
It formulates adaptive labeling as a finite-horizon Markov decision process and employs Smoothed-Autodiff optimization to reduce gradient variance effectively.
For weakly supervised video detection, uncertainty modeling adjusts loss weights based on predicted variance, thus enhancing performance under noisy annotations.

Uncertainty-aware action labeling refers to a class of methods that explicitly account for quantifiable uncertainty in labeling actions within data, particularly when data is expensive to annotate or supervision is weak. These methods extend standard labeling and detection pipelines by incorporating principled uncertainty quantification, guiding both labeling effort (e.g., in active or adaptive enumeration) and model training (e.g., in weakly-supervised detection), thus improving efficiency and reliability of model predictions. Two prominent domains in which uncertainty-aware action labeling has been formalized are adaptive labeling under budget constraints and weakly-supervised spatio-temporal action detection in videos. Both leverage probabilistic estimates and tailored loss formulations to address ambiguity and noise in supervision.

1. Adaptive Labeling as a Markov Decision Process

Adaptive labeling can be formulated as a finite-horizon Markov decision process (MDP) in which the state encapsulates current posterior beliefs about the data-generating function and the remaining labeling budget. Specifically, at step $t$ , the state $s_t = (\mu_t, b_t)$ comprises the posterior belief $\mu_t(\cdot)$ and the budget $b_t$ , where $\mu_t$ is the posterior over $f$ induced by prior $\mu$ and observed data up to $t$ batches. Actions correspond to the selection of a batch of $K_t$ unlabeled inputs $X^{t+1}$ , represented as indicator vectors $S \in \{0, 1\}^n$ with $\sum_i S_i = K_t$ . The transition kernel draws batch labels and updates the posterior, encapsulating randomness from true label generation and any posterior approximation.

The expected terminal uncertainty is quantified with functionals such as $G(\mu_T) = \operatorname{Var}_{f \sim \mu_T}[g(f)]$ , where $g(f)$ is the downstream quantity of interest (e.g., mean squared error, ATE). The optimal labeling policy minimizes $\mathbb{E}_\pi[G(\mu_T)]$ , where $\pi = (\pi_0, \dots, \pi_{T-1})$ are batch-selection policies parametrized by the posterior (2502.06076).

2. Policy Parameterization and Optimization in Adaptive Labeling

Policy parameterization typically employs a continuous scoring function $w \in \mathbb{R}_+^n$ over the candidate pool, with $w(\theta) = \phi_\theta(\text{emb}(\mu_t))$ generated by, for example, an MLP operating on an embedding of the posterior. Batch actions are sampled using weighted $K$ -subset sampling: $p(S | w) \propto \sum_{\sigma \in \text{Perm}(S)} \prod_{j=1}^K \frac{w_{i_j}}{\sum_\ell w_\ell - \text{previous picks}}.$ While the policy is smooth in $\theta$ except for the discrete subset selection, the non-differentiable nature of the pipeline presents optimization challenges. Traditional score-function (REINFORCE) estimators for policy gradients,

$\nabla_\theta H(\theta) = \mathbb{E}_S[G(\mu_+^S)\nabla_\theta \log \pi_\theta(S)],$

exhibit extreme variance in large combinatorial spaces due to the near-zero probabilities of particular $K$ -subsets.

The Smoothed-Autodiff approach introduces differentiable surrogates for both sampling (soft $K$ -subset sampling via tempered Gumbel-softmax) and posterior update (weighted updates for GPs/ensembles). For temperature $\tau > 0$ , a continuous relaxation $a(\theta) \in [0, 1]^n$ with $\sum a_i = K$ enables pathwise gradient computation via autograd: $\nabla_\theta H_\tau(\theta) = \mathbb{E}_{\text{gumbels}}[\partial_\theta G(\mu_+^{a(\theta)})].$ This trades a bias ( $O(\tau^2)$ ) for significantly reduced gradient variance and orders-of-magnitude faster learning, as both theory and experiments confirm (2502.06076).

3. Uncertainty-Aware Weakly Supervised Action Detection

In spatio-temporal action detection from untrimmed videos, direct pixel- or frame-level labeling is often impractical. The uncertainty-aware multiple-instance learning (MIL) framework treats each video-clip as a bag $X_i$ with (possibly multiple) video-level labels $y_i \in \{0, 1\}^C$ . Person tubelets, generated by linking COCO-trained Faster-RCNN detections into $K$ -frame segments, constitute MIL instances.

The core model, a SlowFast ResNet50-based network with parallel classification and uncertainty heads, predicts per-tubelet logits $f_{i,j} \in \mathbb{R}^C$ and log-variance $v_{i,j} = \log \sigma_{i,j}^2 \in \mathbb{R}^C$ (output by a softplus-activated predictor). Class probabilities for each tubelet are obtained via sigmoid; bag-level predictions are formed via max-pooling: $p_{i, l} = \max_j p_{i, j, l},$ which best matches the “at least one instance” MIL prior.

Per-class, per-bag uncertainty is attached to the tubelet maximizing $p_{i, j, l}$ using its predicted $\sigma_{i,j,l}$ , yielding $\tilde{\sigma}_{i, l}^2 = \exp(v_{i, j^*, l})$ for $j^* = \arg\max_j p_{i, j, l}$ . The final loss weights cross-entropy inversely by this uncertainty, following [Kendall & Gal]: $\mathcal{L}_{i,l} = \frac{1}{\tilde{\sigma}_{i, l}^2} \left[ -y_{i, l}\log p_{i, l} - (1-y_{i, l})\log(1 - p_{i, l}) \right] + \log \tilde{\sigma}_{i, l}^2.$ This enables the model to hedge its predictions when labels are noisy or missing, as high uncertainty reduces penalty (Arnab et al., 2020).

4. Empirical Evaluation and Comparative Performance

Adaptive labeling using the MDP formulation and Smoothed-Autodiff optimization distinctly outperforms standard heuristics and REINFORCE-style policy gradients in both synthetic and real datasets. In a synthetic regression task with $n \approx 500$ unlabeled points, a one-step lookahead policy using Smoothed-Autodiff (with $\tau = 0.1$ , $N=1$ ) achieved posterior variance $\approx 0.03$ compared to static uncertainty sampling ( $\approx 0.13$ ) and random sampling ( $\approx 0.45$ ). MSE on held-out data followed the same trend. The Smoothed-Autodiff approach required only one rollout per iteration to surpass REINFORCE using $N=2000$ (2502.06076).

In weakly-supervised video action detection, uncertainty modeling yielded consistent improvements. On UCF101-24, standard MIL with max pooling achieved VideoAP at IoU=0.2/0.5 of 60.7/33.5, but the addition of uncertainty-aware weighting increased these to 61.7/35.0—about 80% of the fully supervised performance—surpassing prior SOTA weakly supervised methods by wide margins. On AVA, as supervision was weakened (longer sub-clips per bag), the gap to fully-supervised results widened, but the uncertainty-aware framework remained robust, with, e.g., 22.4 FrameAP at $N=1$ s clips (90% of FS baseline at 24.9) and 4.2 for full-video bags (Arnab et al., 2020).

5. Impact of Uncertainty Modeling and Pooling Choices

In both adaptive labeling and weakly-supervised action detection, incorporating explicit uncertainty quantification substantially enhances the system's ability to cope with noisy, sparse, or ambiguously labeled data. In large-pool adaptive selection, pathwise Smoothed-Autodiff gradients yielded significantly lower mean-squared error in gradient estimation and set new benchmarks in uncertainty reduction rates for the same labeling budget. For MIL-based action detection, the uncertainty head enabled the model to deflate penalties for incomplete coverage (empty or misdetected bags), consistently improving action detection scores across batch and instance sampling setups.

Pooling choice is significant. Max-pooling best enforces the standard MIL prior when distractor instances are present. Uncertainty-aware losses further ameliorate the impact of missing positives or false negatives, as the model can “hedge” with high variance predictions rather than apply incorrect forced attributions. This flexibility is crucial as sampling more instances per bag or fewer bags per batch can otherwise degrade normalization statistics, but uncertainty-based weighting mitigates the impact.

6. Broader Implications and Theoretical Perspective

Uncertainty-aware action labeling bridges probabilistic inference, deep learning, and decision theory in data annotation and real-world detection. The MDP-based adaptive labeling framework is agnostic to particular uncertainty quantification techniques (posterior from GPs, ensembles, etc.) and supports a variety of policy optimization mechanisms via continuous parameterizations. The introduction of differentiable surrogates for combinatorial selection and Bayesian updating opens new avenues for scalable, efficient exploration under finite budgets.

In probabilistic MIL for action detection, explicit per-instance uncertainty predictions sidestep strong assumptions about positive instance existence. This approach equates to learning with a heteroscedastic Boltzmann likelihood, theoretically justifying the joint learning of predictive confidence and class likelihood. A plausible implication is that such methods are transferable to other annotation-constrained regimes, including rare event detection and batch-mode active learning.

7. Summary Table: Methodological Comparison

Domain	Uncertainty Role	Optimization Approach
Adaptive Labeling	Guides budget allocation via posterior variance on estimand	Smoothed-Autodiff (pathwise), REINFORCE (score function)
Weakly-Supervised Detection	Weights cross-entropy loss per instance/bag; hedges against noisy or missing labels	Joint prediction of logit and log-variance (uncertainty) heads; MIL with probabilistic pooling

The surveyed frameworks demonstrate that uncertainty-aware labeling, by formalizing and leveraging uncertainty estimates during both selection and training, significantly improves both efficiency and final model accuracy under practical constraints (2502.06076, Arnab et al., 2020).

Markdown Report Issue Upgrade to Chat

References (2)

A Planning Framework for Adaptive Labeling (2025)

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Uncertainty-aware Action Labeling.