W²-Based Estimator Overview

Updated 19 November 2025

W²-Based Estimator is a set of methods that utilize the squared 2-Wasserstein distance to robustly estimate parameters across diverse statistical and physics applications.
It minimizes discrepancies between empirical and model distributions using efficient algorithms and convex programming to ensure robust inference in settings like covariance estimation and MMSE recovery.
In experimental physics, it reconstructs neutrino energy via the hadronic invariant mass, achieving lower bias and improved resolution compared to conventional techniques.

The term W $^2$ -based estimator encompasses a diverse set of estimation techniques utilizing the squared 2-Wasserstein distance or the invariant mass squared $W^2$ as a central component of the statistical criterion. Its applications span robust parameter estimation in location–scale models, minimax optimality in distributionally robust statistics, semidefinite relaxations in graphical model inference, and precision neutrino energy reconstruction in experimental physics. This entry surveys these approaches, highlighting their theoretical underpinnings, explicit formulations, asymptotic properties, and performance characteristics.

1. Fundamental Definitions and Scope

The W $^2$ -based estimator arises in at least three distinct but formally related statistical domains:

Optimal Transport–Driven Estimation: Here, the W $^2$ estimator minimizes the squared 2-Wasserstein ( $W_2^2$ ) distance between the empirical and parametric distributions, often within a location–scale family (Amari et al., 2020, Amari, 2020).
Distributionally Robust Optimization (DRO): Inverse covariance or regression parameter estimation is framed as a minimax problem over Wasserstein balls, producing estimators that are robust to distributional misspecification measured in $W_2$ (Nguyen et al., 2018, Nguyen et al., 2019).
Experimental Particle Physics: "W $^2$ " refers to the invariant mass squared of the hadronic system. The W $^2$ -based estimator in this context reconstructs the incident neutrino energy using measured final-state hadronic kinematics (Thorpe et al., 14 Nov 2025).

Despite diverse domains, these estimators share the utilization of the squared 2-Wasserstein distance or $W^2$ as a principled risk measure or as a physically meaningful summary statistic.

2. W $_2^2$ -Estimation in One-Dimensional Location–Scale Models

In the location–scale family on $\mathbb R$ with density $p(x;\mu,\sigma) = \sigma^{-1}f\big( \frac{x-\mu}{\sigma} \big)$ , the W $_2^2$ -estimator is defined via

$(\hat\mu, \hat\sigma) = \arg\min_{\mu \in \mathbb{R}, \, \sigma>0} W_2^2(F_n, F_{\mu,\sigma}),$

where $F_n$ is the empirical CDF and $F_{\mu,\sigma}$ is the model CDF with parameters $\mu, \sigma$ .

Key properties:

The squared Wasserstein distance between $F_n$ and $F_{\mu,\sigma}$ is

$W_2^2(F_n, F_{\mu,\sigma}) = \int_0^1 [F_n^{-1}(u) - \mu - \sigma F^{-1}(u)]^2 du,$

where $F^{-1}(u)$ denotes the quantile function of $f$ .

The estimator has closed-form:

$\hat\mu = \frac{1}{n} \sum_{i=1}^n X_{(i)}, \qquad \hat\sigma = \sum_{i=1}^n k_i X_{(i)},$

with $k_i = \int_{(i-1)/n}^{i/n} F^{-1}(u) du$ and $X_{(i)}$ the sample order statistics.

Asymptotic normality: As $n \to \infty$ ,

$\sqrt{n} \begin{pmatrix} \hat\mu - \mu_0 \ \hat\sigma - \sigma_0 \end{pmatrix} \to N(0, \Sigma(\mu_0, \sigma_0)),$

with $\Sigma(\mu_0, \sigma_0)$ determined by moments of $f$ . For the Gaussian case, this coincides with the Cramér–Rao lower bound, i.e., the estimator is Fisher-efficient in this setting (Amari et al., 2020, Amari, 2020).

3. Distributionally Robust Inverse Covariance Estimation

The W $_2$ -based estimator plays a foundational role in distributionally robust maximum likelihood for inference of covariance or precision matrices: $\min_{\mu \in \mathbb{R}^p, X \succ 0} \sup_{Q: \, W_2^2(Q, N(\mu_n, \Sigma_n)) \le \rho^2} L(X; Q),$ where $L(X; Q)$ is Stein's loss, and the supremum is over Gaussian laws within $W_2$ -radius $\rho$ of the empirical moments. This can be equivalently recast as a tractable semidefinite program (SDP) (Nguyen et al., 2018): $\min_{X \succ 0, \gamma \in \mathbb{R}} -\log\det X + (\rho^2 - \mathrm{Tr}\,\Sigma) \gamma + \gamma^2 \mathrm{Tr}[(\gamma I - X)^{-1} \Sigma], \quad \gamma I \succ X.$ Analytical shrinkage solutions are available in the absence of structure constraints, where the estimator is interpreted as a nonlinear shrinkage of the sample covariance eigenvalues, guaranteeing invertibility, well-conditioning, rotation equivariance, and order preservation of eigenvalues automatically. For sparsity-constrained problems, sequential quadratic approximation (SQA) is used (Nguyen et al., 2018).

4. W $_2^2$ -Robust MMSE Estimation via DRO

In signal recovery under the linear model $Y = HX + N$ , a distributionally robust estimator is constructed by minimizing the worst-case mean squared error over independent Wasserstein balls centered at candidate normal priors for both $X$ and $N$ : $\min_\phi \max_{P \in \mathcal{P}} \mathbb{E}_P[\|X - \phi(Y)\|^2],$ where $\mathcal{P}$ is a product Wasserstein ball. The saddle-point is attained at an affine mapping $\phi^*(Y) = A^*Y + b^*$ and a Gaussian least-favorable prior. The corresponding parameters are computed by solving an SDP (or efficiently with a Frank–Wolfe scheme), with the worst-case covariances determined via convex maximization constrained by Wasserstein radii (Nguyen et al., 2019).

5. W $^2$ -Based Neutrino Energy Estimator in LArTPCs

In experimental neutrino physics, $W^2$ is the visible hadronic invariant mass squared: $W_{\text{vis}}^2 = (E_{\text{had}}, \vec{p}_{\text{had}})^2 = E_{\text{had}}^2 - |\vec{p}_{\text{had}}|^2.$ The W $^2$ -based estimator for the incident neutrino energy is given by (Thorpe et al., 14 Nov 2025): $E^{W^2}_{\text{est}} = \frac{W_{\text{vis}}^2 - n_p^2 (m_n - E_b)^2 - m_\ell^2 + 2 n_p (m_n - E_b) E_\ell}{2 [ n_p m_n - n_p E_b - E_\ell + p_\ell \cos\theta_\ell ] },$ where $n_p$ is the count of detected protons, $m_n$ the neutron mass, $E_b$ a binding energy correction, and $E_\ell, p_\ell, \theta_\ell$ are charged-lepton observables. This estimator is robust across energy regimes, yields the smallest average bias ( $\lesssim 2\%$ over 0.5–6 GeV), and is relatively insensitive to hadronic modeling systematics and final state interactions compared to traditional calorimetric or muon-kinematics-based methods. It is particularly suited for analyses in Liquid Argon Time Projection Chambers (LArTPCs) to optimize both resolution and control over systematic uncertainties (Thorpe et al., 14 Nov 2025).

Method	Average Bias	Resolution ( $\langle \sigma_E/E \rangle$ )
CCQE-like	15%	30%
W $^2$ -based	2%	18%
Proton-based	5%	8%
Calorimetric	4%	14–20%
Sobczyk–Furmanski (SF)	1%	5%

6. Computational and Methodological Considerations

In one-dimensional models, W $_2^2$ -based estimators are computationally efficient: sorting and a weighted sum suffice, and for standard reference laws (normal, Student's $t$ , logistic), all required weights can be precomputed or tabulated (Amari et al., 2020, Amari, 2020). In high-dimensional covariance or regression settings, Wasserstein-DRO leads to convex programs or SDPs that can be solved efficiently with specialized solvers, leveraging structure or iterative algorithms such as Frank–Wolfe (Nguyen et al., 2018, Nguyen et al., 2019).

For the LArTPC application, $W^2$ requires accurate identification of charged hadrons above detection thresholds; events without reconstructed protons are excluded from this estimator and handled by alternatives (e.g., calorimetry or exclusive channel treatments) (Thorpe et al., 14 Nov 2025).

7. Connections, Theoretical Properties, and Applicability

The W $^2$ -based estimator framework possesses several general features:

Robustness: By optimizing over Wasserstein balls, estimators exhibit controlled sensitivity to distributional shifts or modeling inaccuracies—especially valuable in contexts featuring complex tails or systemic misspecification (Nguyen et al., 2018, Nguyen et al., 2019, Thorpe et al., 14 Nov 2025).
Asymptotic Validity: In canonical settings, W $_2^2$ -based estimators are consistent and yield explicit asymptotic distributions; Fisher efficiency is attained in the Gaussian location–scale case (Amari et al., 2020, Amari, 2020).
Regularization Properties: In precision matrix estimation, Wasserstein-based shrinkage regularizes the spectrum, ensuring invertibility and well-conditioning without explicit constraints (Nguyen et al., 2018).
Physical Interpretability: In experimental reconstruction (e.g., neutrino physics), $W^2$ corresponds directly to a measured invariant mass, grounding the estimator in observable quantities (Thorpe et al., 14 Nov 2025).

Applicability domains include robust parametric statistics, graphical model learning, signal processing under distributional uncertainty, and experimental high-energy physics. The estimator is particularly valuable where robustness to model errors and systematic uncertainties is at a premium, as well as in hybrid schemes combining orthogonal estimation criteria for optimal coverage and minimal bias.