Papers
Topics
Authors
Recent
Search
2000 character limit reached

Luckiness-weighted NML

Updated 6 April 2026
  • LNML is a generalized form of Normalized Maximum Likelihood that incorporates a non-negative luckiness weight, ensuring a proper minimax regret solution.
  • The method regularizes models, particularly in continuous or high-dimensional settings, by modifying the likelihood with a luckiness function to avoid divergence.
  • Practical applications of LNML include multivariate normal models, discrete memoryless sources, and ridge-regression-like scenarios, bridging traditional NML and Bayesian techniques.

Luckiness-weighted Normalized Maximum Likelihood (LNML) is a generalized universal distribution extending the normalized maximum likelihood (NML) to parametric models where NML is ill-defined or divergent, particularly continuous or high-capacity settings. LNML introduces a non-negative "luckiness" or weight function over the parameter space, regularizing the model and ensuring well-posedness of the minimax regret solution. LNML appears in statistical inference, coding theory, model selection, and recent advances in regularized estimation and high-dimensional settings.

1. Formal Definition and Core Properties

Given a parametric family M={p(x;θ):θΘ}\mathcal{M} = \{p(x;\theta):\theta\in\Theta\} and a sample xnXnx^n \in \mathcal{X}^n, standard NML is defined as

pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).

If CnC_n diverges (e.g., for Gaussian models), NML is not defined. LNML replaces the maximum likelihood in both numerator and denominator with a luckiness-weighted form using a weight (luckiness) function π(θ)>0\pi(\theta)>0: pˉnLNML(xn)=maxθΘ[p(xn;θ)π(θ)]Cn(π),Cn(π)=XnmaxθΘ[p(xn;θ)π(θ)]dμ(xn).\bar p^{\mathrm{LNML}}_n(x^n) = \frac{\max_{\theta\in\Theta} [p(x^n;\theta)\pi(\theta)]}{C_n(\pi)}, \quad C_n(\pi) = \int_{\mathcal{X}^n} \max_{\theta\in\Theta}[p(x^n;\theta)\pi(\theta)] d\mu(x^n). With π(θ)1\pi(\theta)\equiv 1, LNML reduces to ordinary NML. LNML is the unique pointwise minimax solution to the regret function

Rπ(q;p,xn)=logp(xn)π(θ)q(xn),R_\pi(q;p,x^n) = \log \frac{p(x^n)\pi(\theta)}{q(x^n)},

so that

pˉnLNML(xn)=argminq:q=1maxθ,xnRπ(q;p,xn).\bar p^{\mathrm{LNML}}_n(x^n) = \arg\min_{q:\int q=1} \max_{\theta, x^n} R_\pi(q;p,x^n).

LNML always yields a proper (normalized) distribution assuming integrability of the weighted likelihood.

2. Minimax Regret, Asymptotics, and Interpretations

The regret of LNML under the luckiness-weighted regime is defined as

RL(x)=logpLNML(x)+logpθ(x)(x)=logC(L)logL(θ(x)),R^{L}(x) = -\log p_{\mathrm{LNML}}(x) + \log p_{\theta^*(x)}(x) = \log C(L) - \log L(\theta^*(x)),

where xnXnx^n \in \mathcal{X}^n0. The worst-case regret is

xnXnx^n \in \mathcal{X}^n1

so LNML achieves constant regret determined by the log normalization term. For regular (smooth) parametric families of dimension xnXnx^n \in \mathcal{X}^n2, the asymptotic expansion of xnXnx^n \in \mathcal{X}^n3 under the Laplace method is

xnXnx^n \in \mathcal{X}^n4

where xnXnx^n \in \mathcal{X}^n5 is the Fisher information. This yields the same leading minimax xnXnx^n \in \mathcal{X}^n6 growth as NML for appropriate xnXnx^n \in \mathcal{X}^n7.

3. Construction and Examples in Key Parametric Families

3.1 Multivariate Normal Models

For observations xnXnx^n \in \mathcal{X}^n8, with xnXnx^n \in \mathcal{X}^n9 the standard pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).0-Gaussian likelihood, LNML uses a conjugate-like prior as luckiness: pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).1 with pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).2, pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).3, pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).4. This choice ensures convergence of the maximization and the normalization integrals. The resulting LNML has a closed-form: pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).5 with explicit formulas for pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).6 as weighted MAP estimators, and pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).7 involving special functions (multivariate Gamma).

3.2 Discrete Memoryless Sources (DMS)

With categorical probabilities pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).8 and luckiness pˉnNML(xn)=maxθΘp(xn;θ)Cn,Cn=XnmaxθΘp(xn;θ)dμ(xn).\bar p^{\mathrm{NML}}_n(x^n) = \frac{\max_{\theta\in\Theta} p(x^n;\theta)}{C_n}, \quad C_n = \int_{\mathcal{X}^n}\max_{\theta\in\Theta}p(x^n;\theta)d\mu(x^n).9, the LNML numerator becomes

CnC_n0

for counts CnC_n1 (CnC_n2), and normalization is via summing over count vectors. For CnC_n3 (“Jeffreys” luckiness), leading order regret matches NML, with a different CnC_n4 offset.

3.3 Linear Regression with CnC_n5 Luckiness

For linear regression with Gaussian errors and a ridge-like luckiness, CnC_n6, LNML in the supervised predictive version (LpNML) yields not only consistent regularization but a predictive distribution that can be computed exactly as a shifted Gaussian, blending in-sample interpolation with conservative extrapolation in under-determined cases (Bibas et al., 2022).

4. Algorithmic and Theoretical Insights

LNML density and its normalization constant typically require inner maximization and outer integration—often intractable in high dimensions. For penalized empirical risk minimization, let CnC_n7 be a penalty (luckiness), then LNML code length is

CnC_n8

To address computational challenges, analytic upper bounds ("uLNML") were derived (Miyaguchi et al., 2018). Under smoothness and convexity assumptions, CnC_n9 is computable in closed-form for π(θ)>0\pi(\theta)>00/π(θ)>0\pi(\theta)>01 penalties and is uniformly close to the true LNML code length—enabling practical parameter selection (MDL-RS) in high dimensions.

5. Role of Luckiness and Incorporation of Side Information

The choice of luckiness function π(θ)>0\pi(\theta)>02 (or π(θ)>0\pi(\theta)>03, π(θ)>0\pi(\theta)>04) encodes prior beliefs, regularization, or auxiliary information:

  • Priors or pseudo-priors: Conjugate-like weights encode beliefs or enforce lower bounds (e.g., for covariance matrices).
  • Side information: Incidental data or null-hypothesis values can be incorporated by constructing π(θ)>0\pi(\theta)>05 to bias estimators towards plausible regions or to allow finite regret when ordinary NML diverges.
  • Regularization: π(θ)>0\pi(\theta)>06 luckiness directly yields ridge regression behavior, regularizing hypothesis space and controlling complexity in high-capacity settings (Bibas et al., 2022).
  • Statistical evidence measures: LNML enables discrimination information π(θ)>0\pi(\theta)>07 to assess evidence for model comparison, with asymptotic calibration and robustness to multiplicity (Bickel, 2010).

6. Connections to NML, Bayesian Mixtures, and π(θ)>0\pi(\theta)>08-NML

LNML unifies and interpolates various universal coding/prediction paradigms:

  • With π(θ)>0\pi(\theta)>09, ordinary NML is recovered.
  • LNML is a limiting case of pˉnLNML(xn)=maxθΘ[p(xn;θ)π(θ)]Cn(π),Cn(π)=XnmaxθΘ[p(xn;θ)π(θ)]dμ(xn).\bar p^{\mathrm{LNML}}_n(x^n) = \frac{\max_{\theta\in\Theta} [p(x^n;\theta)\pi(\theta)]}{C_n(\pi)}, \quad C_n(\pi) = \int_{\mathcal{X}^n} \max_{\theta\in\Theta}[p(x^n;\theta)\pi(\theta)] d\mu(x^n).0-NML as pˉnLNML(xn)=maxθΘ[p(xn;θ)π(θ)]Cn(π),Cn(π)=XnmaxθΘ[p(xn;θ)π(θ)]dμ(xn).\bar p^{\mathrm{LNML}}_n(x^n) = \frac{\max_{\theta\in\Theta} [p(x^n;\theta)\pi(\theta)]}{C_n(\pi)}, \quad C_n(\pi) = \int_{\mathcal{X}^n} \max_{\theta\in\Theta}[p(x^n;\theta)\pi(\theta)] d\mu(x^n).1 with a prior pˉnLNML(xn)=maxθΘ[p(xn;θ)π(θ)]Cn(π),Cn(π)=XnmaxθΘ[p(xn;θ)π(θ)]dμ(xn).\bar p^{\mathrm{LNML}}_n(x^n) = \frac{\max_{\theta\in\Theta} [p(x^n;\theta)\pi(\theta)]}{C_n(\pi)}, \quad C_n(\pi) = \int_{\mathcal{X}^n} \max_{\theta\in\Theta}[p(x^n;\theta)\pi(\theta)] d\mu(x^n).2 (Bondaschi et al., 2022). Mixture/Bayesian codes correspond to pˉnLNML(xn)=maxθΘ[p(xn;θ)π(θ)]Cn(π),Cn(π)=XnmaxθΘ[p(xn;θ)π(θ)]dμ(xn).\bar p^{\mathrm{LNML}}_n(x^n) = \frac{\max_{\theta\in\Theta} [p(x^n;\theta)\pi(\theta)]}{C_n(\pi)}, \quad C_n(\pi) = \int_{\mathcal{X}^n} \max_{\theta\in\Theta}[p(x^n;\theta)\pi(\theta)] d\mu(x^n).3, LNML to pˉnLNML(xn)=maxθΘ[p(xn;θ)π(θ)]Cn(π),Cn(π)=XnmaxθΘ[p(xn;θ)π(θ)]dμ(xn).\bar p^{\mathrm{LNML}}_n(x^n) = \frac{\max_{\theta\in\Theta} [p(x^n;\theta)\pi(\theta)]}{C_n(\pi)}, \quad C_n(\pi) = \int_{\mathcal{X}^n} \max_{\theta\in\Theta}[p(x^n;\theta)\pi(\theta)] d\mu(x^n).4.
  • LNML (with appropriate luckiness) sits at the edge of the trade-off between mixture predictors and hard minimax NML, providing a uniform constant-regret bound and avoiding divergences present in unconstrained NML.

7. Practical Applications and Model Selection

LNML is particularly relevant in:

  • Model selection under MDL: Explicit, finite-complexity penalty even for non-compact or continuous models, augmenting MDL-based criteria (Miyaguchi, 2017, Miyaguchi et al., 2018).
  • High-dimensional penalty selection: The MDL-RS method leverages analytic uLNML to select regularization parameters efficiently, outperforming cross-validation and BIC/AIC in highly-redundant/high-dimensional regimes (Miyaguchi et al., 2018).
  • Prediction under distribution shift: LNML/LpNML provides bounded, calibrated regret and improved robustness over empirical risk minimization, with improved out-of-distribution characteristics (Bibas et al., 2022).
  • Robust inference and multiple comparisons: LNML-based discrimination information adapts robustly when integrating diverse side information and controls error rates in high-throughput testing (Bickel, 2010).

References:

  • "Normalized Maximum Likelihood with Luckiness for Multivariate Normal Distributions" (Miyaguchi, 2017)
  • "Statistical inference optimized with respect to the observed sample for single or multiple comparisons" (Bickel, 2010)
  • "High-dimensional Penalty Selection via Minimum Description Length Principle" (Miyaguchi et al., 2018)
  • "Alpha-NML Universal Predictors" (Bondaschi et al., 2022)
  • "Beyond Ridge Regression for Distribution-Free Data" (Bibas et al., 2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Luckiness-weighted NML (LNML).