Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hyvärinen Score

Updated 22 June 2026
  • Hyvärinen Score is a strictly proper scoring rule defined via second derivatives of the log-density, enabling inference for unnormalized continuous models.
  • It underpins score matching methods that replace likelihood maximization with closed-form estimators, particularly for exponential family models.
  • The score facilitates robust model comparison and hyperparameter tuning in complex settings such as time series, graphical models, and nonparametric densities.

The Hyvärinen score is a strictly proper, local, homogeneous scoring rule for continuous probability densities, designed to enable parameter inference and model selection in contexts where the normalization constant of the model is intractable or ill-defined. It underpins the "score matching" estimation principle, provides a consistent foundation for bandwidth and hyperparameter selection, and allows for robust model comparison in both parametric and nonparametric settings, including unnormalized and pseudo-likelihood models.

1. Definition and Fundamental Properties

Let XRdX \in \mathbb{R}^d be a random variable with twice-differentiable density p(x)p(x). The Hyvärinen score, SH(p,x)S_H(p, x), is given by

SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,

where x\nabla_x denotes the gradient and Δx=k=1d2/xk2\Delta_x = \sum_{k=1}^d \partial^2/\partial x_k^2 is the Laplacian. The score depends only on local properties of the (log-)density at xx up to second derivatives.

Key structural properties include:

  • Strict Properness: Expected score is uniquely minimized when pp equals the data-generating density pp_\star.
  • 2-Locality: Only the derivatives of logp\log p at the data point p(x)p(x)0 are required.
  • Homogeneity: Multiplying p(x)p(x)1 by any positive constant leaves p(x)p(x)2 invariant.

Properness follows from integration by parts; for any p(x)p(x)3 sufficiently smooth and decaying at infinity,

p(x)p(x)4

with equality iff p(x)p(x)5 (Mameli et al., 2014, Shao et al., 2017).

2. Score Matching and Estimation Procedures

The Hyvärinen score forms the basis of the "score matching" estimator for parametric models p(x)p(x)6. The empirical Hyvärinen score is minimized in place of log-likelihood:

p(x)p(x)7

This estimator requires only derivatives of the log-density and never involves the normalizing constant. For exponential family densities, p(x)p(x)8, the score matching objective is quadratic in p(x)p(x)9 and often yields linear, closed-form estimating equations (Schwank et al., 9 Jan 2025).

Robust score matching is achieved by partitioning the data into blocks, computing blockwise estimates, and aggregating via a geometric median-of-means, yielding estimators resilient to contamination and heavy tails while still relying exclusively on derivatives of SH(p,x)S_H(p, x)0 (Schwank et al., 9 Jan 2025).

3. Applications in Model Selection and Bandwidth Tuning

The Hyvärinen score is widely employed for model comparison, density estimation, and hyperparameter selection in settings where likelihood-based procedures are infeasible. For model selection, the cumulative prequential Hyvärinen score for a sequence of predictions is

SH(p,x)S_H(p, x)1

Unlike the log-score, SH(p,x)S_H(p, x)2 is invariant to normalizing constants and does not suffer from issues like Bartlett’s paradox or ill-defined Bayes factors with vague priors. Asymptotically, under regularity, for non-nested parametric models SH(p,x)S_H(p, x)3 and SH(p,x)S_H(p, x)4,

SH(p,x)S_H(p, x)5

where SH(p,x)S_H(p, x)6 is a Fisher-information–type divergence. The Hyvärinen score thus selects, in the limit, the model minimizing this divergence to the true process, a criterion distinct from Kullback–Leibler optimality (Shao et al., 2017).

For bandwidth selection, as in BART-based causal inference ("Direct Bayesian Additive Regression Trees for Conditional Average Treatment Effects in Regression Discontinuity Designs" (Kondo et al., 4 Mar 2026)), the empirical Hyvärinen criterion over a grid of bandwidths SH(p,x)S_H(p, x)7 is computed:

SH(p,x)S_H(p, x)8

where SH(p,x)S_H(p, x)9 and SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,0 are first and second derivatives with respect to model predictions. The optimal SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,1 is chosen by minimizing SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,2, sidestepping normalization issues and directly targeting predictive accuracy (Kondo et al., 4 Mar 2026).

4. Time Series, Graphical Models, and Kernel Estimation

In time series, the Hyvärinen estimator operates on sequences or their sufficient statistics, producing estimators for AR, MA, and long-memory ARFIMA models that avoid the need to compute the likelihood normalization or marginalization over latent states (Columbu et al., 2019, Mameli et al., 2014). The efficiency of Hyvärinen-based estimators varies with the process: they are highly competitive in MA and ARFIMA, but less so in highly persistent AR models, where pairwise likelihood often dominates (Mameli et al., 2014, Columbu et al., 2019).

For undirected graphical models and unnormalized exponential families, score matching based on the Hyvärinen score offers closed-form estimators and robustification via the geometric median-of-means (Schwank et al., 9 Jan 2025).

In nonparametric kernel density estimation, the Hyvärinen score enables fully data-driven tuning of both bandwidth (SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,3) and exponentiation parameter (SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,4) in exponentiated KDEs. The Hyvärinen-based objective bypasses the intractable normalization constant inherent to exponentiated forms, yielding consistent and optimally convergent estimators for multi-modal densities and densities with outliers (Imai et al., 2022).

5. Computation for Intractable and State-space Models

For models with intractable marginal likelihoods or complex latent-variable structures, the Hyvärinen score admits efficient estimation by Monte Carlo. In the prequential framework for Bayesian model comparison, the required derivatives can be estimated from Sequential Monte Carlo (SMC) or SMC2 schemes, even in non-linear and non-Gaussian state-space models. For discrete outputs, finite difference analogues of the score preserve strict propriety and homogeneity (Shao et al., 2017).

This makes the Hyvärinen score suitable for high-dimensional or otherwise complex models, including stochastic volatility driven by Lévy processes and SDE-based population models, where its robustness to prior vagueness and invariance to normalization are essential (Shao et al., 2017).

6. Theoretical Guarantees and Practical Guidance

Consistency, efficiency, and robustness results for Hyvärinen-score–based estimators are well-established:

  • Consistency: Minimum-score estimators are consistent and asymptotically normal, with sandwich (Godambe) variance (Columbu et al., 2019, Mameli et al., 2014).
  • Asymptotic Model Selection: In both i.i.d. and state-space regimes, the prequential Hyvärinen score selects the asymptotically Fisher-information–optimal model (Shao et al., 2017).
  • Bandwidth and Hyperparameter Rates: In kernel-based methods, tuning via the Hyvärinen score achieves established minimax rates for density estimation (Imai et al., 2022).
  • Robustness: Median-of-means aggregation ensures stability in the presence of outliers or heavy-tailed noise (Schwank et al., 9 Jan 2025).

Empirically, the Hyvärinen score exhibits superior performance in multi-modal, contaminated, or non-likelihood-amenable contexts, though higher Monte Carlo variance relative to likelihood-based criteria may be observed in finite-sample or highly non-Gaussian scenarios (Imai et al., 2022, Shao et al., 2017).

7. Summary Table: Core Formulae and Properties

Context Hyvärinen/Score Matching Formula Key Features
General (continuous) SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,5 Proper, local, homogeneous
Exponential family SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,6 No normalizing constant needed
Time series, Gaussian SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,7 Avoids high-dimensional determinants
Pseudo/BART bandwidth SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,8 Posterior MC, no normalization
Exponentiated KDE SH(p,x)=Δxlogp(x)+12xlogp(x)2,S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,9 Joint tuning of x\nabla_x0, x\nabla_x1; IHS-LOO consistency

Minimization is always with respect to the arguments of interest, be they parameters, hyperparameters, or kernel bandwidths.


References: (Kondo et al., 4 Mar 2026, Schwank et al., 9 Jan 2025, Mameli et al., 2014, Columbu et al., 2019, Shao et al., 2017, Imai et al., 2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hyvärinen Score.