Cross-Family Model Selection

Updated 20 May 2026

Cross-family model selection is a process that compares fundamentally different statistical and machine learning models, addressing challenges from non-nested parameterizations and heterogeneous complexity measures.
It employs innovative diagnostics such as cross-validation, hierarchical and adaptive methods to overcome estimation bias, ensuring fair performance comparisons across model families.
Recent approaches provide theoretical guarantees and empirical validations, using techniques like distribution-free bounds and relaxation paths to balance interpretability and flexibility.

Cross-family model selection refers to the principled process of comparing, selecting, and inferring among statistical or machine learning models drawn from distinct architectural or theoretical classes—often with non-nested or heterogeneously-structured families, such as decision trees vs neural networks, block models vs graphons, or exponentials vs nonparametric models. Classical approaches to model selection—AIC, BIC, holdout error, marginal likelihood—are typically optimized for comparisons within a fixed model class. Cross-family selection raises structural and methodological challenges: differences in parameterization, complexity, estimation bias, and risk surfaces mean that direct comparison can be misleading or ill-posed. Recent research has produced formal algorithms, performance guarantees, and domain-specialized methodologies allowing robust cross-family selection, supported by theoretical and empirical analysis.

1. Conceptual Foundations and Problem Scope

Cross-family model selection generalizes model comparison from homogenous (nested or parametrically related) families to settings where the candidate set includes fundamentally different inductive biases, architectural priors, or mathematical structures. The motivation is ubiquitous in contemporary machine learning, statistical genetics, cognitive modeling, network science, and econometrics, where model uncertainty is not limited to hyper-parametric variations but to the very nature of the generative assumptions.

Notable challenges include:

Non-nestedness (lack of common parameterization)
Incommensurate complexity measures and regularization
Risk of structural over/underfitting distinct across model classes
Sensitivity of standard error metrics, especially in finite samples, to distributional differences between families
Absence of off-the-shelf penalty terms or theory for generalized information criteria.

As a result, cross-family selection demands new forms of diagnostics, penalization, and decision-making protocols, often tailored to the particular model classes involved and the scientific questions at stake.

2. Universal and Distribution-Free Cross-Validation Theory

For black-box model comparison—including across families such as boosting vs forests, kernel-SVMs vs deep nets, parametric vs nonparametric—the cornerstone is cross-validation with appropriate risk estimation. Wager demonstrated that while standard cross-validation is asymptotically uninformative about the absolute test error of a given rule (its leading bias is model-independent), it is nonetheless consistent for model selection: the bias cancels in pairwise differences, and thus the procedure is guaranteed to select the model with smaller excess risk, provided both converge at polynomial rates (Wager, 2019). Hence, under very mild assumptions, cross-validation may be deployed for honest cross-family selection, even between methods with different rates and error surfaces.

Recent advances on the generalization of cross-validated model selection deliver distribution-free deviation bounds. By organizing the model space as a Learning Space—a partially ordered collection of hypothesis classes calibrated by VC-dimension—one can inject domain knowledge to sharpen selection and increase generalization. Explicit nonasymptotic bounds on type-I–IV errors (risk estimation, ERM excess, model-selection bias, combined oracle excess) are derived for both bounded and unbounded loss settings; theoretical results guarantee almost sure convergence of the selected model to the true minimal-risk class, provided appropriate complexity controls are imposed (Marcondes et al., 2023).

3. Adaptive Methods for Hierarchical and Mixed-Architecture Model Selection

Hierarchical models, mixture models, and settings where models cluster into hierarchically-structured or partially nested classes require scalable methods for efficient cross-family selection. In the context of Bayesian hierarchical cognitive modeling, cross-validation with variational Bayes (CVVB) provides a computationally efficient screening mechanism. Predictive test likelihoods are estimated for each candidate model, and high-probability consistency with marginal likelihood-based ranking is established even when parameter dimensions are large (Dao et al., 2021). Thus, CVVB enables rapid pruning of large, structurally diverse model families before resorting to more expensive, model-specific methods.

In reinforcement learning, the ARL-GEN algorithm achieves cross-family selection among nested transition kernel families, with provable high-probability oracle regret bounds. The method involves epochal model testing and only transitions to a higher-complexity class if a statistical discrepancy is detected. Extensions to linear mixture MDPs enable adaptation to sparsity or norm constraints without requiring separability assumptions (Ghosh et al., 2021). The overall regret of adaptive model selection matches that of an oracle—demonstrating negligible statistical overhead relative to prior knowledge of the true class.

4. Domain-Specific Cross-Family Selection: Network Models and Population Genetics

In statistical network analysis, the penalized edge-sampling cross-validation (PNN-CV) framework formalizes model selection across nested and non-nested classes: stochastic block models (SBM), degree-corrected SBMs, and nonparametric graphon models (Yang, 17 Jun 2025). The criterion

$\text{Crit}(\delta^{(m)}) = \text{CV}(\delta^{(m)}) + \lambda_n d_m$

penalizes candidate $m$ by both cross-validated test loss and a model-complexity term $d_m$ . Under mild regularity conditions on estimation and separation, PNN-CV is proven to select the correct model class with probability tending to one as $n \to \infty$ , even when signal strengths and model dimensions differ across families.

In population genetics, robust cross-family selection between coalescent models (e.g., Kingman with growth vs. $\Xi$ -coalescent with multiple mergers) is achieved by constructing low-dimensional but powerful summary statistics (the singleton-tail statistic) and then conducting a simulation-calibrated likelihood-ratio test (Koskela et al., 2018). This method maintains high power and calibrated size under challenging conditions (recombination, selection), but is sensitive to population structure, exemplifying the importance of appropriate statistic selection and simulation-based calibration in cross-family inference.

5. Algorithmic and Experimental Innovations: Relaxation Paths and Model Agreement

A novel paradigm for cross-family selection arises in the "neural decision tree relaxation" procedure. The method seeds initial models from a rigid family (e.g., axis-aligned decision trees) and then constructs a continuous relaxation (neural decision tree, NDT) parametrized by sharpness parameters $\gamma_1, \gamma_2$ , interpolating from "crisp" DTs to soft neural networks (Oca et al., 2020). By tracking both classification performance and decision-level agreement (Cohen's $\kappa$ ) as the sharpness is relaxed, the method identifies whether departures from the rigid family yield substantive gains, and if so, how far one must move toward the nonlinear regime to achieve them. This enables a user-informed, computationally light, and interpretable cross-family exploration—elucidating when benefits from increased flexibility justify loss of interpretability.

6. Selective Inference and Post-Selection Validity in Cross-Family Comparison

A crucial issue in cross-family selection is the validity of inferences made after model selection (post-selection inference). Under a general joint asymptotic CLT for the selection metric vector and post-selection test statistics, pivot quantities can be constructed to yield valid confidence intervals and hypothesis tests, irrespective of the model selection protocol—AIC, cross-validated, or randomized procedures (Markovic et al., 2017). Gaussian randomization of selection vectors ensures that affine selection events and tractable distributional representations are available, unifying a wide family of model-selection schemes and enabling rigorous, uniformly valid selective inference in cross-family contexts.

7. Finite-Sample Performance, Limitations, and Outlook

Empirical validations across domains highlight both the capabilities and the caveats of cross-family selection protocols. Penalized cross-validation consistently guards against overfitting when model complexity varies starkly between families. In GMM-based structural econometric modeling, cross-validated selection outperforms AIC/BIC in avoiding overfitting to flexible but misspecified models (Komiyama et al., 2018). Synthetic minimal models designed to dissect out-of-distribution (OOD) error clarify that even when shortcut feature utilization can be detected in training, OOD failure is always family-dependent—a robust selection criterion must monitor both within-family signal comparison and cross-family risk profiles (Li, 13 May 2026).

Limitations persist: grid-based relaxations can be heuristic, extension to arbitrary model types (e.g., kernel SVMs, high-dimensional graphical models) is nontrivial, and computational cost for exhaustive validation grows rapidly with the size of the candidate family. Domain-specific simulation is often required for calibration. Open research avenues include continuous optimization paths for inter-family relaxation, joint model selection-performance uncertainty quantification, and unified cross-family aggregation in non-nested settings.

Key References

Domain/Topic	Principal Method	Reference (arXiv id)
Universal CV theory	Model-independent bias cancellation	(Wager, 2019)
Distribution-free bounds	Learning Spaces, VC deviance bounds	(Marcondes et al., 2023)
Hierarchical/Bayesian models	CVVB screening and marginal likelihood	(Dao et al., 2021)
Networks	Penalized edge-sample CV (PNN-CV)	(Yang, 17 Jun 2025)
Population genetics	Singleton-tail statistic LRT	(Koskela et al., 2018)
Relaxation paths (DT→NNs)	Neural Decision Tree (NDT) relaxation	(Oca et al., 2020)
RL/adaptive selection	ARL-GEN, ARL-LIN algorithms	(Ghosh et al., 2021)
Structural econometrics	Cross-validated GMM model selection	(Komiyama et al., 2018)
Selective inference	Unified post-selection pivot CLT	(Markovic et al., 2017)
OOD failure, minimal models	Signal separation and shortcut transition	(Li, 13 May 2026)

Each of these frameworks formalizes aspects of cross-family model selection in high-dimensional, structured, and domain-specific regimes, with theoretical guarantees and practical criteria for principled comparison, selection, and inference.