Dynamic & Class-Aware Thresholding

Updated 15 March 2026

The paper presents dynamic thresholding techniques that use class-specific empirical statistics to adjust decision boundaries, yielding improved calibration and efficiency.
It outlines methodologies like quantile alignment, Otsu-based residual thresholding, and online F1 maximization to address data imbalance and optimize performance.
Empirical results highlight significant gains, such as up to 23% GFLOPs reduction on segmentation tasks and enhanced metrics in pseudo-labeling and outlier detection.

Dynamic and class-aware thresholding encompasses a suite of statistical and algorithmic frameworks devised to adapt decision boundaries for classification, segmentation, pseudo-labeling, outlier detection, and calibration tasks. These frameworks eschew static, one-size-fits-all cutoffs in favor of thresholds that adjust in response to local class statistics, confidence scores, empirical priors, or even evolving test-time conditions. This paradigm arises from limitations of conventional global thresholding, especially in the presence of data imbalance, class heterogeneity, or distributional shift, and yields pronounced improvements in computational efficiency, metric calibration, and robust generalization.

1. Formal Models and Mathematical Principles

Dynamic and class-aware thresholding recasts the selection of operating points (i.e., thresholds) as a class- or sample-dependent optimization, often grounded in score quantiles, empirical class priors, or distributional statistics. In the case of binary or multi-class classifiers, let $p(x)$ denote the model's posterior or confidence for a given class. The simplest form introduces a data-dependent threshold $\tau_c$ for each class $c$ : $\hat{Y}_c(x; \tau_c) = \begin{cases} 1 & \text{if } p_c(x) > \tau_c \ 0 & \text{otherwise} \end{cases}$ For multi-label or multi-class scenarios, thresholds may be derived so that pseudo-label assignments in the unlabeled set respect estimated class proportions $\hat{p}_c$ , as in

$\tau_c^+ = \sup\left\{ t \in [0,1] : \frac{1}{m}\sum_{j=1}^m 1\{p_c(x_j) \ge t\} \ge \hat{p}_c \right\}$

and similarly for negatives (Xie et al., 2023).

Other methodologies ground threshold selection in distributional separation, maximizing between-class variance (Otsu-style) over loss/residual histograms to delineate inliers from outliers (Sun, 2022), or optimize for balanced F1 score under severe imbalance (Bohara, 2020, Hong et al., 2016).

In segmentation and early-exit architectures, per-class thresholds are learned from empirical mean confidences and the separation between top and runner-up class probabilities, with additional normalization to enforce lower and upper bounds (Görmez et al., 2022).

Dynamic thresholding also appears in test-time calibration, where class-specific thresholds are updated from a cache of high-entropy outliers to dynamically suppress overconfidence in out-of-distribution (OOD) detection frameworks (Wu et al., 18 Jan 2026).

2. Algorithmic Implementations

Implementations vary according to the context but share key algorithmic motifs:

Distributional quantile alignment: Compute per-class thresholds that guarantee pseudo-label distribution matches the estimated class frequency from labeled samples (Xie et al., 2023). This requires sorting scores and selecting cutoffs according to empirical quantiles in each class.
Otsu-based residual thresholding: At each iteration, build a histogram of measurement residuals, then select a threshold maximizing the inter-class (inlier/outlier) variance. For high-outlier regimes, this process can be applied in multiple successive layers to progressively purify the inlier set (Sun, 2022).
Online F1 maximization: Establish the decision threshold that maximizes F1 score via one-dimensional search, initialized by the intersection of class-conditional distributions (e.g., Gaussian fits) for “auto” (positive) and “cross” (negative) classes (Bohara, 2020).
Dynamic per-class cache construction: For OOD detection, maintain FIFO buffers for each class containing high-entropy samples, then dynamically update class-specific thresholds as the maximal OOD score observed among high-entropy confounders (Wu et al., 18 Jan 2026).
Adaptive threshold modules in deep networks: Augment segmentation architectures with trainable submodules that produce per-pixel or per-class thresholds via feed-forward networks, with joint optimization using auxiliary losses (Fayzi et al., 2023).

The following table summarizes several approaches:

Context	Thresholding Mechanism	Source
Multi-label pseudo-labeling	Quantile alignment w.r.t. class priors	(Xie et al., 2023)
Segmentation, early-exit networks	Per-class mean-confidence separation, scaling	(Görmez et al., 2022)
Outlier rejection in geometry	Multi-layer Otsu residual thresholding	(Sun, 2022)
Object re-ID, face recognition	Online F1 maximization, Gaussian statistics	(Bohara, 2020)
OOD detection	Per-class cache, dynamic max-score thresholds	(Wu et al., 18 Jan 2026)
Medical image segmentation	Trainable, per-pixel (per-class) thresholds	(Fayzi et al., 2023)

3. Class-Awareness and Data Imbalance

A defining attribute of these frameworks is explicit class-awareness: thresholds adapt to the empirical frequency, confidence, or in-distribution statistics of each class. In multi-label semi-supervised learning, the class prior $\hat{p}_c$ —even when estimated from scarce labeled data—is empirically and theoretically shown to match the true distribution with high accuracy, justifying its use as a quantile target (Xie et al., 2023). For segmentation and multi-exit models, "easy" classes (with strongly peaked confidence profiles) obtain lower thresholds, permitting early exit, whereas more ambiguous or rare classes retain higher thresholds (Görmez et al., 2022).

In imbalanced regimes, as in minority class detection in manufacturing or biometric identification, adaptive thresholding ensures the rare class is not overwhelmed by the majority class, countering the pitfall of fixed global cutoffs (Hong et al., 2016, Bohara, 2020). In these contexts, thresholding uses empirical class statistics to correct for skew, via node-wise adaptation in trees or data-driven optimization in probabilistic classifiers.

4. Empirical Outcomes and Evaluation

Dynamic and class-aware thresholding leads to measurable gains across multiple domains:

Early-exit segmentation: On Cityscapes, class-based thresholds achieved a $23\%$ reduction in GFLOPs versus fixed global thresholds at minor mIoU loss (<$1$ point), with even more pronounced compute–accuracy trade-offs available via threshold hyperparameters (Görmez et al., 2022).
Medical image segmentation: Adaptive, trainable thresholds improved Dice by $\sim$ 3.8 percentage points compared to fixed scalar thresholds, with a $20\%$ decrease in false positives (Fayzi et al., 2023).
Multi-label semi-supervised learning: Class-aware pseudo-labeling achieved a 10–15 point gain in overall-F1 at initial epochs and maintained superior mAP across standard datasets at low labeling rates (Xie et al., 2023).
Outlier-robust geometric estimation: The TIVM estimator exhibited outlier tolerance up to 70–90%, with convergence in 3–15 iterations, consistently outperforming RANSAC and GNC in speed and robustness (Sun, 2022).
Online object recognition: Adaptive thresholds yielded 12–45% improvement in accuracy and dominated ROC metrics relative to fixed choices (Bohara, 2020).
OOD detection: DCAC reduced FPR95 by $6.55\%$ on ImageNet OOD benchmarks when paired with baseline detectors, with minimal inference overhead (Wu et al., 18 Jan 2026).

5. Generalization Guarantees and Theoretical Insights

Several frameworks provide formal guarantees. In class-aware pseudo-labeling, Hoeffding’s inequality bounds the deviation between estimated and true class proportions, with convergence rates $O_p(1/\sqrt{n})$ (Xie et al., 2023). The downstream generalization gap is bounded by the pseudo-labeling error $\epsilon$ , complexity measures (e.g., Rademacher complexity), and sample size, decomposing the risk into model and labeling error components. In optimal thresholding for linear classifiers, the class-balanced threshold $\tau^* = \mu$ (minority class fraction) is shown to be provably optimal under moment-matching (Hong et al., 2016).

6. Limitations and Practical Considerations

Most approaches presume that class-conditional or residual-score distributions are sufficiently separated (bi-modality for Otsu methods), or that empirical priors are well estimated from limited labels. Failure of separation, heavy-tailed or multi-modal distributions, or shifting priors may degrade thresholding efficacy, potentially requiring additional post-hoc tuning (e.g., layer number in TIVM (Sun, 2022)). Computationally, the primary cost arises from per-class histogram or similarity computations, but practical implementations restrict updates to batch-wise or sliding-window events, ensuring low latency (Bohara, 2020, Wu et al., 18 Jan 2026).

A plausible implication is that in highly non-stationary or extremaly imbalanced regimes, hybrid or regularized thresholding strategies—combining data-driven and structural priors—may yield further robustness.

7. Extensions and Future Research Directions

Ongoing lines of work include extending dynamic thresholding to meta-learning setups, integrating explicit cost matrices for multi-class problems, and combining threshold selection with uncertainty quantification and calibration. Adaptive thresholding modules can be further embedded into transformer architectures or dynamically pruned computation graphs for more general resource-constrained inference.

Another avenue is the incorporation of explicit temporal adaptation for streaming or lifelong learning, refining thresholds as distributions drift. Lastly, theoretical connections between optimal thresholding, information-theoretic impurity measures, and margin theory in deep networks remain fertile ground for further formalization.

Dynamic and class-aware thresholding provides a principled, empirically validated set of techniques for adapting model behavior to the underlying data structure, class distribution, and task constraints. By abandoning static thresholds in favor of data-informed, context-sensitive cutoffs, these frameworks dramatically advance the robustness, efficiency, and fairness of modern recognition, segmentation, and detection systems (Görmez et al., 2022, Xie et al., 2023, Sun, 2022, Bohara, 2020, Hong et al., 2016, Fayzi et al., 2023, Wu et al., 18 Jan 2026).