Quotient-NML (qNML): Model Selection Framework
- qNML is an information-theoretic framework that uses normalized maximum likelihood to achieve parameter- and prior-free model selection and structure learning.
- It quantifies evidence through ratios of NML scores, providing asymptotically reliable discrimination information for hypothesis testing and multiple comparisons.
- qNML underpins Bayesian network learning with decomposable, efficient scores that maintain minimax optimality and robust performance in high-dimensional settings.
Quotient-NML (qNML) is an information-theoretic penalized likelihood framework for statistical model comparison and structure learning, rooted in the minimax optimality of normalized maximum likelihood (NML) coding. It provides a parameter-free, prior-free, and sample-optimal approach for hypothesis testing, discrimination quantification, and model selection, particularly in contexts such as multiple comparisons and Bayesian network structure learning. qNML evaluates the strength of evidence by forming ratios of NML scores for competing models, generalizes to weighted likelihoods for cases where standard NML is undefined, and admits efficient, decomposable formulations for high-dimensional applications.
1. Theoretical Foundations
The NML density, defined for a parametric model and observed data , is given by
where denotes the MLE for sample and is the normalization (regret) constant ensuring minimax worst-case log-loss performance (Bickel, 2010).
qNML quantifies the evidence in favor of one model over another by the ratio
where each is the NML for model 0 with parameter space 1. The log of this ratio,
2
is termed the discrimination information (DI) and represents the difference in NML code-lengths between models. In contrast to Bayes factors, DI does not require prior specification, and its minimax property does not average over unobserved samples.
2. Key Properties and Interpretability
qNML inherits several desirable theoretical guarantees:
- Minimax observed-sample optimality: Each NML component achieves minimax regret for the observed data (Bickel, 2010).
- Asymptotic reliability: For any fixed threshold 3, the probability of misleading evidence—DI favoring the incorrect model—decays exponentially as sample size increases:
4
- Prior-free operation: No prior distributions are needed for nuisance or interest parameters, in contrast to procedures like the Bayes factor.
- Strong evidence calibration: DI can favor a simple null hypothesis, behaves predictably under increasing sample size, and satisfies vanishing misleading evidence criteria.
- Score equivalence (in structure learning): In Bayesian network applications, qNML assigns identical scores to Markov equivalent DAGs, crucial for search algorithms working over equivalence classes (Silander et al., 2024).
3. Weighted Quotient-NML and Extensions
When the standard NML is undefined or inapplicable (such as for sufficient statistics or conditional models), qNML can be generalized using weighted likelihoods: 5 leading to the normalized maximum weighted likelihood (NMWL)
6
A weighted-qNML ratio and its log-DI then result by analogy, extending the applicability of DI to a broad class of models and settings (Bickel, 2010).
Empirical studies, such as the eight SAT-site comparison and proteomics protein feature analyses, demonstrate the robustness of DI to the choice of weights, especially when sample sizes are moderate or large.
4. Practical Computation and Approximations
For low-dimensional or discrete models, the normalizing constant 7 can be computed exactly. In higher dimensions or with continuous parameters, Laplace approximation yields
8
with 9 as the Fisher information matrix. The code-length consequently approximates to
0
In Bayesian network learning, the Szpankowski–Weinberger closed-form approximation provides an efficient and numerically accurate surrogate for NML regret terms: 1 with 2 and 3, enabling constant-time evaluation even in large models (Silander et al., 2024).
5. Applications in Model Comparison and Structure Learning
qNML provides a general framework for rigorous model comparison, hypothesis testing, and network structure selection:
- Multiple hypothesis testing: The DI statistic offers a calibrated and robust measure of evidence strength across multiple comparisons, with empirical results indicating little need for further multiplicity adjustments when sample sizes are moderate (Bickel, 2010).
- Bayesian network structure learning: qNML defines the score for a network 4 on data 5 of size 6 as the sum over nodes:
7
where 8 is the data for node 9, 0 the number of its states, 1 the number of parental configurations, and the regret difference serves as a universal penalty. This score is decomposable, hyperparameter-free, and consistent.
Empirical benchmarks demonstrate that qNML achieves low structural Hamming distance (SHD) to ground truth, robust predictive accuracy, and often yields the most parsimonious networks among compared methods (notably BIC, BDeu, and factorized NML), with minimal tuning or computational overhead (Silander et al., 2024).
6. Implementation Guidelines and Empirical Performance
Implementation of qNML in network learning workflows involves:
- Calculating multinomial MLE-based log-likelihoods for each variable conditioned on its parent set.
- Evaluating regret term differences using the Szpankowski–Weinberger approximation.
- Aggregating decomposable local qNML scores for global model selection.
Due to its node-wise decomposability and lack of adjustable hyperparameters, qNML integrates seamlessly into existing BN structure-search algorithms (greedy search, dynamic programming, etc.), matching the asymptotic running time of BIC and factorized NML.
Empirical results indicate that qNML:
- Excels in model parsimony relative to fNML and BDeu.
- Maintains predictive log-likelihood close to or surpassing competing criteria at moderate to large sample sizes.
- Produces stable and interpretable network structures with the lowest performance variance across varying data sizes.
7. Context, Significance, and Recommendations
qNML advances information-theoretic model selection by combining the minimax foundation of NML with operational tractability and statistical resilience. Its prior-free, hyperparameter-free nature distinguishes it from Bayesian model selection techniques. In complex inference settings—such as high-dimensional multiple comparisons or Bayesian network learning—qNML provides tuning-free, optimally calibrated, and interpretable model selection, with strong asymptotic guarantees and robust empirical performance (Bickel, 2010, Silander et al., 2024).
A plausible implication is that for practitioners concerned with multiple comparisons or network modeling, qNML offers a principled criterion with desirable theoretical and computational properties, and its penalty form converges to the classic BIC asymptotically, ensuring consistency and efficiency in large-sample regimes.