Hyvärinen Score

Updated 22 June 2026

Hyvärinen Score is a strictly proper scoring rule defined via second derivatives of the log-density, enabling inference for unnormalized continuous models.
It underpins score matching methods that replace likelihood maximization with closed-form estimators, particularly for exponential family models.
The score facilitates robust model comparison and hyperparameter tuning in complex settings such as time series, graphical models, and nonparametric densities.

The Hyvärinen score is a strictly proper, local, homogeneous scoring rule for continuous probability densities, designed to enable parameter inference and model selection in contexts where the normalization constant of the model is intractable or ill-defined. It underpins the "score matching" estimation principle, provides a consistent foundation for bandwidth and hyperparameter selection, and allows for robust model comparison in both parametric and nonparametric settings, including unnormalized and pseudo-likelihood models.

1. Definition and Fundamental Properties

Let $X \in \mathbb{R}^d$ be a random variable with twice-differentiable density $p(x)$ . The Hyvärinen score, $S_H(p, x)$ , is given by

$S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,$

where $\nabla_x$ denotes the gradient and $\Delta_x = \sum_{k=1}^d \partial^2/\partial x_k^2$ is the Laplacian. The score depends only on local properties of the (log-)density at $x$ up to second derivatives.

Key structural properties include:

Strict Properness: Expected score is uniquely minimized when $p$ equals the data-generating density $p_\star$ .
2-Locality: Only the derivatives of $\log p$ at the data point $p(x)$ 0 are required.
Homogeneity: Multiplying $p(x)$ 1 by any positive constant leaves $p(x)$ 2 invariant.

Properness follows from integration by parts; for any $p(x)$ 3 sufficiently smooth and decaying at infinity,

$p(x)$ 4

with equality iff $p(x)$ 5 (Mameli et al., 2014, Shao et al., 2017).

2. Score Matching and Estimation Procedures

The Hyvärinen score forms the basis of the "score matching" estimator for parametric models $p(x)$ 6. The empirical Hyvärinen score is minimized in place of log-likelihood:

$p(x)$ 7

This estimator requires only derivatives of the log-density and never involves the normalizing constant. For exponential family densities, $p(x)$ 8, the score matching objective is quadratic in $p(x)$ 9 and often yields linear, closed-form estimating equations (Schwank et al., 9 Jan 2025).

Robust score matching is achieved by partitioning the data into blocks, computing blockwise estimates, and aggregating via a geometric median-of-means, yielding estimators resilient to contamination and heavy tails while still relying exclusively on derivatives of $S_H(p, x)$ 0 (Schwank et al., 9 Jan 2025).

3. Applications in Model Selection and Bandwidth Tuning

The Hyvärinen score is widely employed for model comparison, density estimation, and hyperparameter selection in settings where likelihood-based procedures are infeasible. For model selection, the cumulative prequential Hyvärinen score for a sequence of predictions is

$S_H(p, x)$ 1

Unlike the log-score, $S_H(p, x)$ 2 is invariant to normalizing constants and does not suffer from issues like Bartlett’s paradox or ill-defined Bayes factors with vague priors. Asymptotically, under regularity, for non-nested parametric models $S_H(p, x)$ 3 and $S_H(p, x)$ 4,

$S_H(p, x)$ 5

where $S_H(p, x)$ 6 is a Fisher-information–type divergence. The Hyvärinen score thus selects, in the limit, the model minimizing this divergence to the true process, a criterion distinct from Kullback–Leibler optimality (Shao et al., 2017).

For bandwidth selection, as in BART-based causal inference ("Direct Bayesian Additive Regression Trees for Conditional Average Treatment Effects in Regression Discontinuity Designs" (Kondo et al., 4 Mar 2026)), the empirical Hyvärinen criterion over a grid of bandwidths $S_H(p, x)$ 7 is computed:

$S_H(p, x)$ 8

where $S_H(p, x)$ 9 and $S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,$ 0 are first and second derivatives with respect to model predictions. The optimal $S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,$ 1 is chosen by minimizing $S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,$ 2, sidestepping normalization issues and directly targeting predictive accuracy (Kondo et al., 4 Mar 2026).

4. Time Series, Graphical Models, and Kernel Estimation

In time series, the Hyvärinen estimator operates on sequences or their sufficient statistics, producing estimators for AR, MA, and long-memory ARFIMA models that avoid the need to compute the likelihood normalization or marginalization over latent states (Columbu et al., 2019, Mameli et al., 2014). The efficiency of Hyvärinen-based estimators varies with the process: they are highly competitive in MA and ARFIMA, but less so in highly persistent AR models, where pairwise likelihood often dominates (Mameli et al., 2014, Columbu et al., 2019).

For undirected graphical models and unnormalized exponential families, score matching based on the Hyvärinen score offers closed-form estimators and robustification via the geometric median-of-means (Schwank et al., 9 Jan 2025).

In nonparametric kernel density estimation, the Hyvärinen score enables fully data-driven tuning of both bandwidth ( $S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,$ 3) and exponentiation parameter ( $S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \|\nabla_x \log p(x)\|^2,$ 4) in exponentiated KDEs. The Hyvärinen-based objective bypasses the intractable normalization constant inherent to exponentiated forms, yielding consistent and optimally convergent estimators for multi-modal densities and densities with outliers (Imai et al., 2022).

5. Computation for Intractable and State-space Models

For models with intractable marginal likelihoods or complex latent-variable structures, the Hyvärinen score admits efficient estimation by Monte Carlo. In the prequential framework for Bayesian model comparison, the required derivatives can be estimated from Sequential Monte Carlo (SMC) or SMC² schemes, even in non-linear and non-Gaussian state-space models. For discrete outputs, finite difference analogues of the score preserve strict propriety and homogeneity (Shao et al., 2017).

This makes the Hyvärinen score suitable for high-dimensional or otherwise complex models, including stochastic volatility driven by Lévy processes and SDE-based population models, where its robustness to prior vagueness and invariance to normalization are essential (Shao et al., 2017).

6. Theoretical Guarantees and Practical Guidance

Consistency, efficiency, and robustness results for Hyvärinen-score–based estimators are well-established:

Consistency: Minimum-score estimators are consistent and asymptotically normal, with sandwich (Godambe) variance (Columbu et al., 2019, Mameli et al., 2014).
Asymptotic Model Selection: In both i.i.d. and state-space regimes, the prequential Hyvärinen score selects the asymptotically Fisher-information–optimal model (Shao et al., 2017).
Bandwidth and Hyperparameter Rates: In kernel-based methods, tuning via the Hyvärinen score achieves established minimax rates for density estimation (Imai et al., 2022).
Robustness: Median-of-means aggregation ensures stability in the presence of outliers or heavy-tailed noise (Schwank et al., 9 Jan 2025).

Empirically, the Hyvärinen score exhibits superior performance in multi-modal, contaminated, or non-likelihood-amenable contexts, though higher Monte Carlo variance relative to likelihood-based criteria may be observed in finite-sample or highly non-Gaussian scenarios (Imai et al., 2022, Shao et al., 2017).

7. Summary Table: Core Formulae and Properties

Context	Hyvärinen/Score Matching Formula	Key Features
General (continuous)	$S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \\|\nabla_x \log p(x)\\|^2,$ 5	Proper, local, homogeneous
Exponential family	$S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \\|\nabla_x \log p(x)\\|^2,$ 6	No normalizing constant needed
Time series, Gaussian	$S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \\|\nabla_x \log p(x)\\|^2,$ 7	Avoids high-dimensional determinants
Pseudo/BART bandwidth	$S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \\|\nabla_x \log p(x)\\|^2,$ 8	Posterior MC, no normalization
Exponentiated KDE	$S_H(p, x) = \Delta_x \log p(x) + \frac{1}{2} \\|\nabla_x \log p(x)\\|^2,$ 9	Joint tuning of $\nabla_x$ 0, $\nabla_x$ 1; IHS-LOO consistency