Sample-Dependent Model Selection

Updated 19 April 2026

Sample-dependent model selection is a suite of methodologies that adjust model choice criteria by incorporating empirical data characteristics such as sample size, censoring, and dependence.
It employs techniques like bootstrap bias correction and penalized divergence measures to enhance model accuracy and efficiency in finite-sample and complex data regimes.
Applications span diverse fields from genomics to dynamic decision-making, offering practical benefits in adaptive complexity control and rigorous post-selection inference.

Sample-dependent model selection refers to a family of methodologies in which the choice of statistical, predictive, or explanatory model is adaptively tailored to the observed data—specifically, to properties such as the sample size, censoring mechanism, dependence structure, or even individual data points. Unlike classical criteria such as AIC or BIC that impose fixed complexity penalties (typically functions of sample size and model dimension), sample-dependent criteria introduce corrections or penalties determined by empirical features of the dataset, the inferred structure of the data-generating process, or explicit resampling. This paradigm encompasses a broad spectrum of approaches ranging from bootstrap-based empirical corrections and penalized divergences in small samples, to dynamic, state-aware selection in sequential decision problems.

1. Conceptual Framework and Motivation

Classical model selection—rooted in large-sample asymptotics—often yields suboptimal or biased results when standard regularity conditions break down, such as in the presence of small sample sizes, high censoring rates, or dependence among observations. The inadequacy of universal penalty terms motivates more refined criteria that adapt penalization, estimation, or selection mechanics to the local properties of the observed sample.

The sample-dependent approach arises in several contexts:

Finite-sample bias correction: Bootstrap-based empirical information criteria compensate for biases in information estimates due to small $n$ or high censoring.
Covariate set adaptation: Selection of feature subsets that vary with sample size or task-specific characteristics, optimizing prediction risk across a continuum of sample regimes.
Data-driven complexity control: Penalized divergence measures that add empirical penalties for overfitting in sparse or discrete data scenarios.
Dependence-aware selection: Penalties and selection criteria that adapt to the dependence structure (short/long-range memory, mixing) of the observations.
Dynamic and contextualized selection: Approaches that condition model choice on observed states, actions, or prior choices, frequently formalized in dynamic programming or reinforcement learning settings.

This philosophy pervades numerous domains, from high-dimensional statistics (variable and model selection), multi-task learning, and causal inference, to signal processing, genomics, and time-series forecasting.

2. Bootstrap-based Model Selection and Empirical Penalty Estimation

Bootstrap sample augmentation provides a principled method for correcting model selection criteria under limited or censored data. For example, in the context of the standard censored regression (Tobit) model, classical tools like AIC and BIC are derived under large- $n$ asymptotics and typically fail to adjust for the substantial information loss due to censoring or sample sparsity. The empirical information criterion (EIC) is constructed by directly estimating the out-of-sample log-likelihood bias via resampling:

$\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$

where $\widehat{B}$ is the average out-of-sample bias estimated by resampling the data ( $B$ bootstrap replicates) and recomputing the log-likelihood and MLE for each replicate. Multiple resampling schemes—nonparametric bootstrap, parametric, and hybrid—allow for robustness against model misspecification and better adaptation to the observed censoring mechanism (Su et al., 2020).

Simulation studies demonstrate that in scenarios with high censoring ( $\geq 70\%$ ) and small $n$ ( $\leq 120$ ), nonparametric EIC variants substantially outperform AIC and BIC in correctly identifying the true model. Computational cost scales with the number of resampling iterations and candidate models, but remains tractable for moderate $B$ ($100$– $n$ 0). This empirical penalty approach generalizes beyond the Tobit model to any setting where bias estimation via resampling is feasible.

3. Model Selection with Adaptive Complexity Penalties

Penalized distance/divergence-based methods provide another avenue for sample-dependent model selection, especially in discrete or small-sample regimes. Penalized Hellinger distance selection corrects for the lack of robustness in classical divergence statistics by introducing an empirical penalty term on the probability mass assigned to empty sample cells:

$n$ 1

Here, $n$ 2 controls the penalty's severity; optimal $n$ 3 can be selected by small-sample cross-validation or Monte Carlo. Penalized criteria of this form exhibit $n$ 4-scale normal asymptotics for model comparison, enable analytical small-sample power approximations, and consistently outperform fixed-penalty approaches such as AIC when the data are sparse or the model’s support is misspecified (Ngom et al., 2011).

4. Sample-size and Task-dependent Selection of Covariates

In multi-task and transfer regression, the optimal selection of predictive features is inherently sample-size dependent: larger $n$ 5 justifies the inclusion of additional covariates, while for small $n$ 6 only the most informative should be retained. The sample-dependent mapping $n$ 7 is determined by minimizing a Mallows-type criterion:

$n$ 8

where $n$ 9 estimates the variance penalty for subset $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 0. The selection procedure produces a schedule of feature sets, each tuned to minimize expected risk for a given task size $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 1. This approach yields asymptotic consistency and finite-sample stability, especially when pooling across multiple tasks ( $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 2) (Azriel et al., 2020).

5. Adaptive Penalties in the Presence of Dependence

Sample-dependent model selection extends to contexts with dependency (short/long-range, anti-persistent). In Gaussian regression with dependent errors, the oracle-inequality-ensuring penalty function must reflect the actual dependence architecture. For a collection of subspaces $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 3,

$\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 4

where $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 5 is the spectral radius of the error covariance matrix $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 6, and $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 7 is model dimension. Under long-range or anti-persistent dependencies, the penalty becomes nonlinear in model dimension, e.g., $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 8 for $\mathrm{EIC} = -2\,\ell(Y|\widehat\theta) + \widehat{B},$ 9. Practical selection of $\widehat{B}$ 0 and $\widehat{B}$ 1 leverages data-driven slope heuristics and residual-based Hurst parameter estimation. Applying standard AIC-like penalties without adapting to dependence leads to overfitting and non-optimal convergence rates (Caron et al., 2020).

6. State- or Sample-point Conditional Selection in Dynamic and Predictive Tasks

In systems with nonstationary or time-varying environments, optimal model choice can depend on both the current state variables and the recent sequence of model choices (“action-history”). Formally, this is a stochastic control problem, solved by reinforcement learning (RL) or dynamic programming:

$\widehat{B}$ 2

Cordoni & Sancetta instantiate this framework in portfolio allocation, where switching costs and covariate-dependent utility drive the dynamic selection among competing models. Fitted Q-iteration with basis expansion yields consistent approximation of the optimal policy under finite-sample, mixing, and basis-complexity conditions (Cordoni et al., 2023). This approach generalizes to any sequential decision context where the “best” model is state- or context-dependent.

In large-scale machine learning, per-sample model selection or routing can lead to substantial gains in efficiency. In neural ASR (e.g., Whisper), a lightweight classifier routes audio samples to the smallest ASR model likely to satisfy a user-specified error tolerance, thus minimizing overall computational cost while retaining accuracy. The routing function is trained as a binary classifier to predict whether the small model's output will differ meaningfully from the large model’s, based on features extracted from the input. Experimental evidence confirms that this reduces mean inference cost by up to 35% at minimal error increase (Malard et al., 2023).

7. Post-selection Inference and Limitations

Sample-dependent procedures, especially those involving non-smooth selection (hard thresholding, model switching), induce non-regular asymptotics. The final estimator may have a distribution that is a mixture over models, and classical Gaussian limit theorems do not apply. Specialized post-selection inference methods can provide valid confidence intervals conditional on the selection process, though such intervals generally require simulation or resampling strategies to approximate the selection-induced distribution (Rothenhäusler, 2020).

Additionally, certain Bayesian model selection methods (Bayes factors, BIC) do not satisfy independence of irrelevant alternatives (IIA): the relative preference between two models can depend on the presence or absence of extraneous models in the candidate class, especially in complex or singular model families such as phylogenetic tree models. This phenomenon, an intrinsic property of routine priors or parametric embeddings, highlights the need for explicit consideration of the full candidate set and possible sensitivity analyses (Zwiernik et al., 2012).

Sample-dependent model selection provides an empirically driven, flexible toolkit for rigorously tailoring complexity and selection criteria to observed data characteristics and dynamic environments. Its methods reinforce the necessity of explicit finite-sample correction, contextual adaptation, and nuanced post-selection inference in modern statistics and machine learning.