Cost-Sensitive Sequential Acquisition

Updated 28 April 2026

Cost-sensitive sequential acquisition is a framework for dynamically selecting costly features or tests to optimize classification accuracy and utility under explicit budget constraints.
It unifies methods from reinforcement learning, bandit models, and active learning to rigorously balance acquisition costs against performance improvements.
Practical methodologies, including RL-based policies and Bayesian optimization, demonstrate measurable cost reduction and enhanced decision-making in areas like healthcare and sensor networks.

Cost-sensitive sequential acquisition refers to the class of algorithms and theoretical frameworks that address the problem of dynamically deciding which information (e.g., features, tests, expert labels, experiments, or actions) to acquire—at each step and for each instance—when each acquisition incurs a cost, with the aim of optimizing a downstream objective such as classification accuracy, utility, or expected reward under an explicit budget or cost-performance trade-off. The literature unifies developments in reinforcement learning, active learning, bandit models, Bayesian optimization, and information design, supplying rigorous guarantees, practical algorithms, and insights into sample efficiency and optimal stopping.

1. Foundational Problem Formulations

Cost-sensitive sequential acquisition generalizes classical learning and decision paradigms by adding explicit, per-action or per-query costs and allowing adaptive, instance-dependent querying. The prototypical formalizations include:

Cost-sensitive sequential classification: At each round, the agent may acquire a subset of features (each with cost), with the process stopping either upon budget exhaustion or when a confidence/utility threshold is met (Kachuee et al., 2019, Janisch et al., 2019, Contardo et al., 2016).
Multi-armed bandit/design of experiments: Each information source (arm) has state-dependent or stochastic costs for exploration, and the objective is to maximize expected reward from exploitation subject to total cost constraints. The adaptation to switching/setup costs and combinatorial selection appears in bandit superprocesses and combinatorial acquisition (0805.2630, Chawla et al., 2024).
Cost-sensitive active learning: The learner actively selects which label or cost vector to query to minimize long-run expected risk, with optimality measured in excess cost and sample complexity (Njike et al., 2023).
Resource-constrained Bayesian optimization: Hyperparameter optimization is cast as a sequential process of deciding which configurations to continue, checkpoint, or stop, based on the predicted improvement in utility minus cost. Freeze-thaw and cost-sensitive extensions instantiate this idea (Lee et al., 24 Oct 2025).
Expert-involved or domain knowledge acquisition: Queries to external experts or data sources are decided adaptively under explicit costs; multi-agent RL and positive-unlabeled learning select the most cost-effective interventions (Wu et al., 24 Aug 2025).

Mathematically, these problems are typically modeled as Markov Decision Processes (MDPs), Partially Observable MDPs (POMDPs), or bandit processes augmented with budgets and cost-sensitive reward signals.

2. Algorithmic Frameworks

A broad array of cost-sensitive sequential acquisition algorithms have emerged, unified by their explicit modeling of cost, adaptive action choice given the observed data and historic actions, and principled stopping criteria. Key paradigms include:

Reinforcement learning approaches: Deep Q-learning, double dueling DQN, actor-critic, and hierarchical RL are prevalent. States encode the observed data subset, masked or via learned representations; actions correspond to feature/test acquisitions or a terminal predict/stop action. Reward functions penalize acquisition cost and classification errors, with trade-offs modulated by λ or direct budget constraints (Janisch et al., 2019, Contardo et al., 2016, Kachuee et al., 2019, Li et al., 2024).
Non-greedy, lookahead, or oracle-based policies: Acquisition-Conditioned Oracle (ACO) and NOCTA directly enumerate (or sample) possible future acquisition sets, evaluating cost-performance trade-offs via (nearest-neighbor or parametric) utility estimates, escaping myopic, one-step greedy selection (Valancius et al., 2023, Dinh et al., 16 Jul 2025).
Freeze-thaw and multi-fidelity Bayesian optimization: Surrogate models (e.g., PFNs) predict learning-curve extrapolations under partial budgets, and acquisition decisions maximize Expected Utility Improvement (EUI) relative to cost-saturated utility (Lee et al., 24 Oct 2025).
Optimistic dynamic programming (ODP) and active learning: Upper confidence bounds and bias-variance-controlled uncertainty drive selection of maximally informative, cost-effective queries; cells and candidate sets are pruned as confidence intervals shrink (Njike et al., 2023, Atan et al., 2016).
Cost amortization and local-to-global approximations: Recent work introduces cost-waterfilling and committal Markov chain reductions to enable global approximation bounds for combinatorial adaptive acquisition under matroid or packing constraints (Chawla et al., 2024).

There are also specialized methods, such as directed acyclic graph policies for sensor selection (Wang et al., 2015), and doubly robust Q-learning for policy learning with missing data (Zhou et al., 13 Apr 2026).

3. Utility Functions, Stopping Criteria, and Theoretical Guarantees

A defining element of cost-sensitive sequential acquisition is the explicit use of a utility or objective function $U$ quantifying the trade-off between accumulated performance and total cost. In hyperparameter optimization, for example:

$U(b, \tilde y_b) = \tilde y_b - \alpha \left(\frac{b}{B}\right)^c,$

monotonically increasing with best performance $\tilde y_b$ and decreasing with cumulative budget $b$ (Lee et al., 24 Oct 2025). Optimal stopping is addressed by normalized regret estimates and adaptive thresholds based on probabilistic models of future improvement, such as:

$\hat R_b = \frac{\hat U_{\max}-U_p}{\hat U_{\max}-\hat U_{\min}},$

with process termination triggered adaptively (Lee et al., 24 Oct 2025).

Theoretical results span:

Regret analysis: Distribution-independent sublinear regret bounds in the number of rounds or queries, as in Seq-OOS, with bounds scaling as $O(\sqrt{T \log T})$ (Atan et al., 2016).
Convergence rates: In cost-sensitive active learning (with refined Tsybakov noise), the gap to the Bayes optimum is provably reduced compared to passive learning, with matching lower bounds and explicit exponents depending on smoothness and margin parameters (Njike et al., 2023).
Approximation ratios: For multi-armed bandit and combinatorial superprocess variants, polynomial-time constant-factor approximations (e.g., 4-approximation for linear utilities, $8(1+\epsilon)$ for concave) are established, even in NP-hard settings, via LP relaxation and rounding strategies (0805.2630, Chawla et al., 2024).
Double robustness and oracle inequalities: In observational policy learning, orthogonal pseudo-outcomes guarantee consistency and fast rates even when only one of the propensity or outcome models is well-specified (Zhou et al., 13 Apr 2026).

4. Representative Methodologies and Domain Instantiations

A variety of algorithmic templates are instantiated in application domains:

Method/Framework	Setting	Notable Features or Guarantees
Cost-optimal sequential Q-learning	Clinical diagnostics	Doubly robust, handles MAR missingness, regret bounds
Opportunistic Learning (OL)	Online feature acquisition	MC Dropout, immediate reward = posterior shift per cost
Nonparametric AL for costs	Active learning/classification	Optimal sample complexity rates via confidence partitioning
Freeze-thaw cost-sensitive BO	Hyperparameter optimization	Expected Utility Improvement, transfer via LC-mixup
Bandit superprocess LP rounding	Combinatorial selection	Cost amortization, local-to-global $\alpha$ -approximation
NOCTA (NP/P)	Longitudinal feature acquisition	Non-greedy subset-search, nearest-neighbor / parametric
PU-ADKA	Expert knowledge for LLMs	PU-based match, multi-agent RL, cost-sensitive gain
DAG policy learning	Sensor/test selection	Empirical risk min., dynamic programming on DAG

Applications span healthcare (diagnostics, testing, resource allocation), sensor networks, recommendation, finance, and knowledge acquisition for LLMs. Information acquisition is typically staged, features/tests are selected adaptively, and predictive or decision-theoretic loss is balanced against explicit monetary or process costs.

5. Empirical Results and Practical Implications

Empirical benchmarks consistently show that cost-sensitive sequential acquisition methods substantially outperform fixed-feature or passive counterparts, and often improve on previous RL or greedy policies in both accuracy and cost reduction. Key findings include:

CFBO achieves normalized regret reductions of 20–50% over baselines like Hyperband variants in hyperparameter optimization (Lee et al., 24 Oct 2025).
NOCTA-NP and NOCTA-P dominate actor-critic and greedy mutual information baselines in longitudinal clinical data at all cost regimes (Dinh et al., 16 Jul 2025).
Aggressive pruning via confidence bounds in active learning yields faster convergence to Bayes risk, especially in low-margin regions (Njike et al., 2023).
Opportunistic Learning and related DQN approaches yield cost-performance curves strictly above greedy or static acquisition policies across datasets (Kachuee et al., 2019, Janisch et al., 2019).
In LLM knowledge acquisition, PU-ADKA achieves higher win rates and lower cost per knowledge gain compared to greedy or random strategies, on both GPT and human-judged benchmarks (Wu et al., 24 Aug 2025).

In healthcare, these methods can halve the average acquisition cost for specified prediction accuracy, providing quantifiable clinical or financial benefit.

6. Open Problems and Future Directions

Despite substantial advances, several core challenges and open questions remain:

Scalability and combinatorial explosion: Efficiently handling very large feature or action sets, especially in multi-step lookahead or combinatorial settings, requires further advances in subset-search, local approximation, and scalable optimization (Chawla et al., 2024, Valancius et al., 2023).
Robustness to noise and misspecification: Developing and analyzing methods that remain optimal under complex noise/uncertainty models, arbitrary feature dependencies, or stochastic acquisition costs is ongoing (Njike et al., 2023, Vershinin et al., 22 Dec 2025).
Integrating external priors, human factors, and preference elicitation: Eliciting user or expert utility functions, or integrating domain knowledge and subjective costs, remains an active area (Lee et al., 24 Oct 2025, Wu et al., 24 Aug 2025).
Broader applications: Extending cost-sensitive acquisition frameworks to domains such as legal, financial, and large-system engineering, where costs are heterogeneous and strategic querying is essential (Wu et al., 24 Aug 2025).
Non-myopic, multi-agent, or distributed acquisition policies: Methods for competitive or cooperative multi-agent selection, and distributed policies under communication constraints, remain nascent (Wu et al., 24 Aug 2025, Chawla et al., 2024).

Cost-sensitive sequential acquisition thus forms a rigorous backbone for adaptive, efficient information gathering across modern learning and decision systems, grounded by strong mathematical guarantees and validated by consistent empirical success.