Numerical Instability in High-Dimensional Likelihood Models
- Numerical instability in high-dimensional likelihood models is defined as challenges arising when the parameter count approaches or exceeds the sample size, leading to unstable estimates and degenerate information matrices.
- Mitigation strategies include penalized estimation (e.g., Lasso), composite likelihoods, and robust empirical likelihood methods that stabilize estimation and control error propagation.
- Advanced techniques such as machine learning surrogates and controlled iterative algorithms offer scalable solutions to overcome computational explosions and maintain reliable inference in complex settings.
Numerical instability in high-dimensional likelihood models refers to the array of computational, statistical, and inferential challenges that arise when the number of model parameters approaches or exceeds the available sample size. In such regimes, classical likelihood theory breaks down: maximum likelihood estimators become unstable or inexistent, standard test statistics may lose their nominal distributions, and direct likelihood evaluation can become infeasible due to combinatorial or computational explosions. Research in the past decade has produced a diverse suite of methodological innovations—including penalized estimation, composite and pairwise likelihoods, sample splitting, robust empirical likelihoods, and state-of-the-art machine learning surrogates—to address instability in high-dimensional settings. The sections below provide a comprehensive overview of the origins, manifestations, and mitigation strategies for numerical instability in high-dimensional likelihood models, drawing extensively on technical results and empirical evidence from leading literature.
1. Sources and Manifestations of Instability
The fundamental source of numerical instability in high-dimensional likelihood models is the mismatch between the parameter space dimension and the information content provided by the sample size . When approaches or exceeds , the likelihood function loses curvature in many directions, resulting in high sensitivity of maximum likelihood estimates (MLEs), inflated variances, and, frequently, non-existence of finite solutions.
Specific sources include:
- Separation in GLMs, where certain linear combinations of covariates perfectly predict outcomes, leading to infinite MLEs (Correia et al., 2019).
- Singular information in mixture models or non-linear regressions, where the Fisher information matrix may become degenerate and yield "slow" statistical rates, which interact with algorithmic sensitivity to further destabilize estimation (Ho et al., 2020).
- Combinatorial explosion in max-stable models or large covariance structures, where the number of likelihood terms or model partitions grows super-exponentially with dimension, making direct evaluation infeasible (Bienvenüe et al., 2014, Wadsworth, 2014).
- Variance amplification in likelihood-free methods using high-dimensional summaries, causing an exponential decay in acceptance rates and unreliable inference due to the "curse of dimensionality" (Kousathanas et al., 2015).
- Numerical error propagation in models that require the evaluation of large matrices or numerically integrated solutions (e.g., ODEs), where small computational errors scale dramatically with increasing dimensionality (Creswell et al., 2023).
Observable consequences:
- Non-existence or non-uniqueness of MLEs (e.g., perfect separation in logistic regression).
- Ill-conditioned or noninvertible Fisher information matrices.
- Highly variable, discontinuous, or jagged profile likelihood surfaces.
- Non-asymptotic distributional behavior of test statistics, especially likelihood ratio tests.
- Unstable or divergent optimization trajectories and inflated or underestimated uncertainty.
2. Penalization, Screening, and Regularization
A principal strategy for mitigating instability is penalized estimation, especially via -regularization or sample-splitting techniques that enforce sparsity or select active parameter subsets.
- Screening via -penalization (Lasso): Critical in reducing the ambient dimension to a manageable "active set," facilitating stable estimation and inference. This forms the core of the multi-split two-sample LR methodology (Städler et al., 2012), and is also prominent in restricted maximum likelihood approaches for causal modeling (Bühlmann et al., 2013).
- Sparse regression and neighborhood selection: Decoupling variable order search from edge (feature) selection stabilizes high-dimensional SEMs, as variable ordering can be performed with non-regularized likelihoods on a small candidate set, followed by Group Lasso-based edge pruning (Bühlmann et al., 2013).
- Sample splitting and multi-split p-value aggregation: Randomly partitioning data into distinct screening and testing subsets eliminates the "double-dipping" of data and ensures correct nominal coverage for p-values. Multi-split aggregation further smooths out the stochasticity inherent in an arbitrary split, leading to reproducible and stable inference (Städler et al., 2012).
These methods typically rest on sparsity and screening assumptions, which ensure that the active parameter set is much smaller than the sample size and that true signals are not lost in the regularization process.
3. Composite and Pairwise Likelihoods
Composite likelihood techniques replace the intractable or unstable full-likelihood with sums of low-dimensional likelihood components.
- Block-wise composite likelihood: Partitions the data (often spatial or network-structured) into blocks and aggregates the likelihood contribution from block means and within-block residuals. This reduces computational cost from to near-linear in block sizes and sidesteps inversion of large, possibly singular, covariance matrices (Chang et al., 2013).
- Pairwise and partition composite likelihoods: In high-dimensional extreme value theory and covariance estimation, the pairwise or partition composite likelihoods only require computation over bivariate or block-marginals. Not only is this feasible for , but, when truncated adaptively via -penalized selection of informative pairs, statistical efficiency can approach that of the full likelihood while maintaining stability (Bienvenüe et al., 2014, Casa et al., 10 Jul 2024).
- Godambe (sandwich) information correction: Because composite likelihoods ignore higher-order dependencies, straightforward variance estimation is typically downward biased. Asymptotic corrections—especially via the Godambe information—restore correct uncertainty quantification for posterior inference (Chang et al., 2013).
- Truncated pairwise likelihood for sparse covariance matrices: Instead of shrinking every parameter, entire pairwise likelihood objects are adaptively selected, ensuring unbiasedness in the retained pairs and facilitating consistent recovery of the nonzero structure as the dimension grows exponentially (Casa et al., 10 Jul 2024).
4. Robust Empirical Likelihood and Debiasing Techniques
Empirical likelihood (EL) in high dimensions faces instability due to the need to estimate high-dimensional nuisance parameters and the sensitivity to outliers.
- Orthogonalization of estimating equations: Projection of estimating functions onto a space orthogonal to the nuisance parameters ensures that plug-in estimation errors do not destabilize the profile empirical likelihood ratio. Asymptotically, this correction leads to or normal limiting distributions and accurate confidence regions even with exponentially growing parameter dimensions (Chang et al., 2018).
- Dual penalization (parameters and Lagrange multipliers): Robust penalized empirical likelihood (RPEL) regularizes not only regression coefficients but also the multipliers enforcing estimating equations. This dual regularization filters out unstable or redundant estimating equations and variables, enhancing both numerical stability and variable selection accuracy, especially with heavy-tailed data (Li et al., 2021).
- Bounded score functions: Incorporating Huber, exponential, or Tukey's biweight bounded scores yields estimators with bounded influence, directly limiting the impact of individual outliers on the optimization landscape and providing stability under contamination and heteroscedasticity (Li et al., 2021).
5. Likelihood-Free, Machine Learning, and Surrogate Methods
Modern likelihood-free and surrogate-based inference methods address instability by bypassing explicit likelihood evaluation in high dimensions or by learning high-dimensional likelihood landscapes.
- Parameter-wise update and sufficient statistics (ABC-PaSS): Likelihood-free MCMC methods that update one parameter at a time while conditioning on a parameter-specific low-dimensional summary statistic maintain stable acceptance rates regardless of the full dimension, dramatically improving scalability (Kousathanas et al., 2015).
- Modular Bayesian Optimization (Split-BOLFI): Additive discrepancy modeling and multiple parallel acquisition functions, each operating on a low-dimensional parameter subset, create a robust belief update through an exponentiated loss-likelihood that is automatically tempered against model misspecification. This modularity ensures stable and efficient high-dimensional inference (Thomas et al., 2020).
- Neural importance sampling and normalizing flows: Machine learning surrogates, such as normalizing flows trained to mimic the true likelihood in high-dimensional spaces, permit smooth, GPU-accelerated evaluation and profiling of complex likelihoods, eliminating the sampling noise and jaggedness associated with Monte Carlo evaluations. Gradient-based maximization of profile likelihoods, initialized from neural surrogates, enables rapid and robust profiling in settings such as SMEFT global analyses (Heimel et al., 1 Nov 2024).
6. Algorithmic Stability, Computational Efficiency, and Asymptotic Guarantees
A fundamental theme in the recent literature is the interplay between algorithmic stability and statistical error.
- Stability/accuracy tradeoffs in iterative algorithms: Decomposition of estimator error into deterministic convergence (e.g., of the "population operator") and sample-induced perturbation is essential for understanding convergence rates and stability. Algorithms such as EM or gradient descent (stable, but slow) and Newton's method (unstable, but fast) can achieve the same statistical accuracy, but with a dramatically different iteration cost when controlled properly (Ho et al., 2020).
- Annealed and controlled SMC for state-space models: Tempering the likelihood via annealing schedules and constructing globally optimal, adaptive proposal distributions via approximate dynamic programming dramatically reduce variance and instability in high-dimensional sequential Monte Carlo algorithms for non-linear, non-Gaussian state-space models (Fulop et al., 2022).
- Deterministic nonlinear filters: Grid-based deterministic integration, as opposed to random particle filtering, enables smooth and stable likelihood surfaces in complex latent variable models, with efficient GPU implementation further scaling performance to high-dimensional regimes (Bégin et al., 2019).
7. High-Dimensional Testing, Model Misspecification, and Practical Diagnostics
High-dimensional hypothesis testing and model calibration require specialized inferential frameworks to retain validity and control over instability.
- Restricted and directional likelihood ratios: Non-nested model comparisons (screened-out spaces, variable selection) necessitate new asymptotic null distributions, such as weighted sums of chi-squared variables, which must be estimated from the empirical score covariance structure (Städler et al., 2012).
- Adjustment for model misspecification: Likelihood ratio tests in high dimensions remain robust (asymptotically standard normal) under mild moment conditions, provided suitable centering adjustments are performed to account for non-Gaussian tails and excess kurtosis (Dörnemann, 2022). QR-based decompositions for determinant calculations stabilize behavior against high condition numbers.
- Numerical solver error in inverse problems: For likelihoods incorporating numerically integrated solutions (e.g., ODE models), even small solver errors can create spurious local optima or jagged surfaces, especially with adaptive step size solvers. Tight solver tolerances and diagnostic checks for smoothness in the likelihood surface are essential to prevent biased inference in such models (Creswell et al., 2023).
Numerical instability in high-dimensional likelihood models is a multifaceted challenge arising from the complex interplay of statistical, algorithmic, and computational effects. The literature demonstrates that thoughtful regularization, composite and modular approaches, robustified empirical strategies, algorithmic stabilization, and advanced machine learning surrogates can overcome these instabilities, prioritizing both computational tractability and inference validity as data dimensionality continues to grow.