Cost-Aware Selective Classification

Updated 18 January 2026

Cost-aware selective classification is a machine learning paradigm that balances prediction accuracy with computational, measurement, and financial costs under strict resource constraints.
It employs methodologies such as integer linear programming, reinforcement learning, multi-stage cascades, and mixed-integer programming to address heterogeneous cost challenges in model invocation and feature acquisition.
Empirical findings in areas like medical diagnostics and computer vision show significant cost reductions with minimal loss in accuracy, highlighting its practical utility in resource-constrained applications.

Cost-aware selective classification is a machine learning paradigm in which the learning system is explicitly optimized to maximize predictive performance under resource constraints, enabling strategic trade-offs between accuracy and computational, measurement, or financial cost. Unlike traditional selective classification, which focuses on abstaining or rejecting uncertain predictions to improve reliability, cost-aware selective classification addresses heterogeneous costs arising from model invocation, feature or covariate acquisition, or downstream decision interventions. Methodologies span portfolio optimization over classifier ensembles, reinforcement learning for adaptive feature acquisition, mixed-integer programming using uncertainty-aware thresholds, and multi-stage classifier design with instance-dependent reject rules. This area is central to applications with hard budget constraints or variable instance complexity, such as medical diagnostics, real-time vision, and resource-constrained embedded systems.

1. Formal Problem Statements and Taxonomy

Cost-aware selective classification problems can be rigorously formulated in several distinct settings:

Model Portfolio Assignment: Given a set of $N$ queries $Q = \{x_1,\dots,x_N\}$ and $M$ pre-trained classifiers $\{f_i\}_{i=1}^M$ with known per-inference cost $b_i > 0$ , the objective is to assign exactly one classifier to each $x_j$ in a way that maximizes total expected accuracy under a total cost budget $B$ . The core ILP is:

$\begin{aligned} \max_{t \in \{0,1\}^{M \times N}} &\quad \sum_{j=1}^N\sum_{i=1}^M \widehat{SP}_i(x_j)\, t_{i,j} - \lambda\sum_{i=1}^M \sigma_i\sum_{j=1}^N t_{i,j}\ \text{s.t.} &\quad \sum_{i=1}^M t_{i,j} = 1, \quad \forall j\ &\quad \sum_{i=1}^M \sum_{j=1}^N b_i\, t_{i,j} \leq B \end{aligned}$

where $\widehat{SP}_i(x_j)$ estimates the per-instance accuracy and $\sigma_i$ is a regularizer reflecting estimator variance (Ding et al., 2024).

Costly Feature Acquisition: Each feature $f_j$ has cost $c_j > 0$ , and the classifier sequentially acquires features and ultimately predicts $y$ for each instance. The objective minimizes total expected cost, defined as the sum of feature costs and misclassification losses, subject to a per-instance budget or a global constraint:

$\min_{\theta} \; \mathbb{E}_{(x,y)}[\ell(y_\theta(x), y) + \sum_{j \in \mathcal{S}_\theta(x)} c_j] \ \text{s.t.} \; \mathbb{E}_{(x,y)} \left[ \sum_{j \in \mathcal{S}_\theta(x)} c_j \right] \leq b\,.$

with $\mathcal{S}_\theta(x)$ those features selected by the policy (Andrade et al., 2020, Janisch et al., 2017, Janisch et al., 2019).

Multi-Stage Cascades and Reject Option: Classifiers are organized into ordered stages or cost tiers. A chained decision process either predicts or rejects an input at each stage, incurring the cost of additional measurements or model evaluations if rejected:

$\min_{f^1,\ldots,f^K}\; \frac{1}{N} \sum_{i=1}^N \sum_{k=1}^K S_i^k \left[ \mathbf{1}\{f^k(z_k^{(i)}) \neq y_i\} + c_{k+1}\, \mathbf{1}\{f^k(z_k^{(i)}) = r\} \right]$

with $S_i^k$ the indicator for whether instance $i$ reaches stage $k$ and $r$ the reject output (Trapeznikov et al., 2012, Xu et al., 2022).

This taxonomy demonstrates the generality of cost-aware selective classification, encompassing discrete model portfolios, sequential feature acquisition, multi-stage gating, and abstention with explicit budgeting.

2. Key Methodological Approaches

Model Portfolio Optimization

OCCAM (Ding et al., 2024) introduces a white-box portfolio optimization over $M$ pre-trained classifiers. Per-query assignment is formalized as an integer linear program (ILP) leveraging an unbiased, low-variance nearest-neighbor estimator for model accuracy $\widehat{SP}_i(x)$ , exploiting Lipschitz continuity in deep feature space. The method globally optimizes expected accuracy under a cost budget using industrial ILP solvers (HiGHS).

Adaptive Feature Acquisition and Sequential Decision Processes

Deep RL-based strategies (Janisch et al., 2017, Janisch et al., 2019) recast feature acquisition as a Markov Decision Process (MDP). At each step, the agent selects either a feature to acquire (paying cost) or issues a prediction. Policies are learned via Deep Q-Learning, enabling adaptive, instance-specific acquisition and robust management of hard or average budget constraints. Extensions include hard budget masking, missing data handling, and the integration of high-performance full-feature classifiers.

Multi-Stage Reject Cascade Design

The boosting-based multi-stage design (Trapeznikov et al., 2012) and UnfoldML (Xu et al., 2022) implement hierarchies of classifiers, where early stages use cheap features and later stages handle rejections using more expensive modalities. Reject rules are derived via explicit parameterization (biassed positive/negative classifiers) or uncertainty-based gating (hard or soft). Global surrogate risk minimization enables end-to-end optimization, achieving close to centralized accuracy with drastically reduced cost.

Mixed-Integer Programming with Uncertainty Measures

A complementary approach uses mixed-integer programming (MIP) to delineate optimal reject regions based on predictive mean and model uncertainty (e.g., MC-Dropout variance) (Yildirim et al., 2019). The MIP jointly allocates predictions/rejections to maximize performance under cost or rejection constraints, producing provably optimal thresholding strategies.

Bayes-Optimal and Tractable Cost-Total Minimization

The Bellman-optimal adaptive acquisition policy (Andrade et al., 2020) recursively minimizes expected total risk (misclassification + acquisition cost) via dynamic programming. Computational approximations leverage generalized additive models (GAMs) and nested feature set sequences to collapse high-dimensional expectations to tractable 1-D integrals. This yields adaptive forward selection strategies that match or exceed recall/fdr/cost criteria versus baseline heuristics.

3. Accuracy, Cost, and Robustness: Empirical Findings

Experimental evaluations across vision, tabular, and medical tasks consistently demonstrate that cost-aware selective classification methods result in marked reductions in resource expenditure with minimal loss in accuracy.

OCCAM (Ding et al., 2024):
- On CIFAR-10 and Tiny ImageNet, achieves up to 40% cost reduction with $<1\%$ drop in accuracy, outperforming single-best and regression forest baseline routers.
- Pareto curves confirm OCCAM's assignments strictly dominate baselines over the accuracy/cost spectrum.
- Robustness to estimator sample size parameter $K$ stabilizes by $K\approx 20$ .
UnfoldML (Xu et al., 2022):
- In clinical sepsis/shock prediction, attains $19.6\times$ cost savings with $<0.1\%$ AUC loss; improves early disease detection timing.
- In coarse-to-fine image classification (CIFAR-100), delivers $4.7$– $6.8\times$ MAC savings for $<1\%$ accuracy reduction.
Deep RL Approaches (Janisch et al., 2017, Janisch et al., 2019):
- Consistently dominate or match the best previous budgeted feature acquisition methods across multiple benchmarks.
- Support direct optimization of cost/accuracy trade-off curves in both average and hard budget regimes.
Bayes-Optimal and GAM-based Policies (Andrade et al., 2020):
- On medical datasets, adaptive acquisition strategies yield the lowest expected total cost while consistently meeting recall constraints.
- Outperform both static feature-selection and prior adaptive methods in terms of cost and false discovery control.
Mixed-Integer Programming (Yildirim et al., 2019):
- Provides up to 15 percentage point improvements over reject-by-uncertainty baselines in classification quality.
- Enables cost-sensitive online fraud management with substantial profit gains relative to industry heuristics.

4. Theoretical Properties and Algorithmic Guarantees

Cost-aware selective classification induces complex joint optimization over prediction policies and resource allocation:

Estimator Guarantees: OCCAM's nearest-neighbor accuracy estimator is asymptotically unbiased, with variance decaying as $1/\sqrt{K}$ given standard Lipschitz/separation hypotheses (Ding et al., 2024).
Surrogate Risk Optimization: Multi-stage boosting frameworks converge to stationary points of global smooth surrogate objectives. Generalization bounds in the two-stage case decompose total error into empirical margin violations at each stage and margin-based complexity terms (Trapeznikov et al., 2012).
MDP/Formal Optimality: RL-based approaches inherit Bellman-optimality for both average and hard budget constraints, providing direct policy improvement guarantees (Janisch et al., 2019, Janisch et al., 2017).
Tractability vs. Bayes Optimal: Dynamic programming admits intractable combinatorial complexity; tractable approximations (e.g., GAM-based 1D projections, monotone feature sequences) yield near-optimal practical performance (Andrade et al., 2020).
MIP Solutions: MIP-based reject region design finds globally optimal assignment of acceptance/rejection under complex cost and uncertainty threshold constraints (Yildirim et al., 2019).

5. Representative Application Domains

Cost-aware selective classification directly impacts domains where resource constraints are prominent or error costs non-uniform:

Medical Diagnostics: Adaptive lab tests, early disease monitoring, and cost-constrained triage exploit per-instance selective acquisition to maximize clinical utility while satisfying recall or cost regulatory constraints (Andrade et al., 2020, Ding et al., 2024, Xu et al., 2022).
Computer Vision and Natural Language Processing: Model portfolios and cost-aware gating allow scaling deep models to deployment on heterogeneous hardware budgets (Ding et al., 2024, Xu et al., 2022).
Fraud Detection and Security: Online selective abstention and risk-aware thresholding enable precise control over rejection/acceptance in high-stakes, cost-sensitive scenarios (Yildirim et al., 2019).
Embedded Systems and Edge ML: Instance-adaptive cascades reduce sensor or compute usage under strict power or latency budgets (noted in multi-stage design and UnfoldML).

6. Insights, Limitations, and Future Directions

Multiple methodological themes emerge:

White-box vs. Black-box Policies: Explicitly modeled instance-wise success probabilities (OCCAM, Bayes-optimal policies) yield provable accuracy/cost trade-off guarantees, in contrast with generic meta-learned routers.
Dynamic Budgeting: Approaches can accommodate online, streaming, or time-varying budget profiles by warm-starting the optimization or incrementally re-solving (ILP/RL) (Ding et al., 2024).
Scalability and Extensibility: While ILP and MIP can address moderate-scale problems exactly, large-scale or real-time applications benefit from approximate solutions or hierarchical decompositions (e.g., soft-gating, DKD distillation in UnfoldML).
Multi-objective Optimization: The core frameworks readily generalize to objectives beyond accuracy and cost, including fairness constraints, latency bounds, and energy considerations (Ding et al., 2024).
Limitations: Current frameworks for cost-aware rejection focus primarily on binary or multiclass settings; structured or regression tasks pose additional challenges. Moreover, performance depends critically on uncertainty quantification and the accuracy of per-instance accuracy estimators. Some methods assume well-separated feature spaces and reliable calibration, which may not always hold (Xu et al., 2022, Andrade et al., 2020, Ding et al., 2024).

This suggests that further research should rigorously benchmark estimator calibration, explore robust expansion to multi-label and regression, and integrate continual learning for shifting distributional costs.

References

OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference (Ding et al., 2024).
Classification with Costly Features using Deep Reinforcement Learning (Janisch et al., 2017).
UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification (Xu et al., 2022).
Classification with Costly Features as a Sequential Decision-Making Problem (Janisch et al., 2019).
Adaptive Covariate Acquisition for Minimizing Total Cost of Classification (Andrade et al., 2020).
Leveraging Uncertainty in Deep Learning for Selective Classification (Yildirim et al., 2019).
Multi-Stage Classifier Design (Trapeznikov et al., 2012).

Markdown Upgrade to Chat

References (7)

OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference (2024)

Adaptive Covariate Acquisition for Minimizing Total Cost of Classification (2020)

Classification with Costly Features using Deep Reinforcement Learning (2017)

Classification with Costly Features as a Sequential Decision-Making Problem (2019)

Multi-Stage Classifier Design (2012)

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification (2022)

Leveraging Uncertainty in Deep Learning for Selective Classification (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cost-Aware Selective Classification.