Papers
Topics
Authors
Recent
2000 character limit reached

Best Gaussian Approximation Methods

Updated 30 December 2025
  • Best Gaussian approximation is the optimal strategy of selecting a Gaussian measure that minimizes divergence metrics, such as KL and Wasserstein, from a target distribution.
  • It leverages variational and minimax principles to derive precise error bounds and convergence rates, especially in high-dimensional or dependent data settings.
  • Practical algorithms—including finite mixtures, normalizing flows, and geometric mappings—ensure robust and computationally efficient Gaussian approximations.

The best Gaussian approximation refers to optimal strategies, rates, and algorithms for approximating a target probability law, process, dataset, or function by a Gaussian distribution or a mixture thereof, under rigorous metrics such as Kullback–Leibler divergence, total variation, Wasserstein distance, or L2L^2 norm. This is foundational in statistical inference, signal processing, machine learning, and Bayesian inverse problems, and exhibits deep connections to optimal transport, information geometry, approximation theory, and empirical process theory. Recent research provides minimax rates, constructive algorithms, and precise error bounds for high-dimensional, dependent, and anisotropic settings.

1. Variational and Minimax Principles for Gaussian Approximation

The canonical definition of best Gaussian approximation is the minimizer of the Kullback–Leibler divergence (KL) from a class of Gaussian laws to a target probability law μ\mu on Rd\mathbb{R}^d or, more generally, on a Hilbert space. For a target μ\mu with density exp(V1ε(x)/εV2(x))\propto \exp(-V_1^\varepsilon(x)/\varepsilon - V_2(x)), the optimal Gaussian ν=N(m,Σ)\nu^*=N(m^*,\Sigma^*) satisfies

(m,Σ)=argmin(m,Σ)Rd×Sd+DKL(N(m,Σ)με)(m^\ast,\Sigma^\ast) = \arg\min_{(m,\Sigma)\in\mathbb{R}^d\times\mathcal{S}^+_d} D_{\mathrm{KL}}(N(m,\Sigma) \Vert \mu_\varepsilon)

where DKLD_{\mathrm{KL}} is the KL divergence. The explicit gradient conditions yield: mDKL=1εEN(m,Σ)[V1ε(X)]+EN(m,Σ)[V2(X)]=0\nabla_m D_{\mathrm{KL}} = \frac{1}{\varepsilon} \mathbb{E}_{N(m,\Sigma)}[\nabla V_1^\varepsilon(X)] + \mathbb{E}_{N(m,\Sigma)}[\nabla V_2(X)] = 0

ΣDKL=12εEN(m,Σ)[D2V1ε(X)]+12EN(m,Σ)[D2V2(X)]12Σ1=0\partial_\Sigma D_{\mathrm{KL}} = \frac{1}{2\varepsilon} \mathbb{E}_{N(m,\Sigma)}[D^2 V_1^\varepsilon(X)] + \frac{1}{2} \mathbb{E}_{N(m,\Sigma)}[D^2 V_2(X)] - \frac{1}{2}\Sigma^{-1} = 0

These conditions generalize to infinite-dimensional function spaces, where the best Gaussian ν=N(m,C)\nu=N(m,C) minimizes DKLD_{\mathrm{KL}} subject to equivalence with a reference Gaussian μ0\mu_0 (Pinski et al., 2014, Lu et al., 2016).

2. Rates and Optimality in High Dimensions and Dependencies

For a sequence of i.i.d. or dependent random vectors X1,,XnRpX_1,\dots,X_n\in\mathbb{R}^p, best Gaussian approximation bounds sharpen classical strong-coupling results. In the i.i.d. case with ppth moment and short-range dependence χ>0\chi>0 (functional dependence measure), the minimax coupling rate interpolates between n1/2n^{1/2} (slow decay) and n1/pn^{1/p} (rapid decay), generalizing Komlós–Major–Tusnády and Bentkus–Chernozhukov lower bounds: max1inSiGi=oP(n1/r(χ,p))\max_{1\le i\le n} |S_i - G_i| = o_P(n^{1/r(\chi,p)}) where r(χ,p)r(\chi,p) is explicit (Karmakar et al., 2020). In high dimensions (pnp\gg n), for independent mean-zero vectors under a restricted sub-Gaussian norm, one achieves

SnZ=OP(p3/2/n)\|S_n - Z\| = O_\mathbb{P}(p^{3/2}/\sqrt{n})

uniformly in p,np,n, with explicit constants, closing the gap left by previous dimension-dependent CLT results (Buzun et al., 2021).

3. Empirical Approximation and Complexity Floor

The empirical approximation of a standard Gaussian law in Rd\mathbb{R}^d by its empirical counterpart over a (potentially highly structured) subset ASd1A\subset S^{d-1} yields an optimally tight uniform bound: supxA,tRFm,x(t)Φ(t)Δ+σ(t)Δ\sup_{x\in A,t\in\mathbb{R}} \left| F_{m,x}(t) - \Phi(t) \right| \leq \Delta + \sigma(t) \sqrt\Delta where Fm,x(t)F_{m,x}(t) is the empirical CDF, σ(t)=Φ(t)(1Φ(t))\sigma(t)=\sqrt{\Phi(t)(1-\Phi(t))}, and ΔΔ0γ1(A)/m\Delta \geq \Delta_0 \approx \gamma_1(A)/m with γ1(A)\gamma_1(A) Talagrand's complexity. Both the error form and the threshold are minimax optimal, firmly linking approximation rates to the geometric complexity of the target set. This analysis further yields Wasserstein–2 (𝓦₂) bounds with precise quantile–coordinate rigidity for random Gaussian embeddings (Bartl et al., 2023).

4. Best Approximation by Finite Gaussian Mixtures

For arbitrary location–mixtures of Gaussians, the best finite mm-component mixture approximation within ff–divergence error ϵ\epsilon is characterized by tail properties of the mixing law:

  • Compactly supported mixing distribution [M,M][-M,M]:

m(ϵ)log(1/ϵ)log(1+1Mlog(1/ϵ))m^*(\epsilon) \asymp \frac{\log(1/\epsilon)}{\log(1 + \frac{1}{M} \sqrt{\log(1/\epsilon)})}

  • Subgaussian tail parameter σ\sigma:

m(ϵ)σlog(1/ϵ)m^*(\epsilon) \asymp \sigma\log(1/\epsilon)

  • Subexponential parameter β\beta:

m(ϵ)β(log(1/ϵ))3/2m^*(\epsilon) \asymp \beta (\log(1/\epsilon))^{3/2}

Attainability leverages local moment matching and Gauss quadrature, while converses derive from low-rank, spectral analysis of trigonometric moment matrices and Toeplitz operators. These rates correct prior errors in m/σ2m/\sigma^2 exponents for Gaussian–Gaussian mixture approximation (Ma et al., 2024).

5. Geometric and Universal Gaussian Approximation

Approximating general laws via pushforwards of Gaussians under diffeomorphisms (“ReparamGA”) or Riemannian exponential maps (“RiemannGA”) yields universal expressivity: p(x)>0,ϕ diffeomorphism:ϕN(0,I)=p(x)\forall\,p(x) > 0, \exists\,\phi \text{ diffeomorphism}: \quad \phi_* \mathcal{N}(0,I) = p(x) The construction employs the Rosenblatt transform and is exact for smooth positive densities. While a single universal mapping for a family {pα}\{p_\alpha\} is obstructed by Chentsov's theorem, minimizing the expected divergence over a family yields nearly best geometric approximations. Practical algorithms are now built around normalizing flows (learned diffeomorphisms) and geometric Laplace–approximation, balancing tractability and expressive power (Costa et al., 1 Jul 2025).

6. Gaussian Approximation for Diffusions, Processes, and Master Equations

For small-noise diffusions, the KL–optimal Gaussian approximation aligns the mean and covariance with solutions to deterministic and Lyapunov ODEs, driving the leading order KL divergence to O(ε)O(\varepsilon) for noise parameter ε\varepsilon. The error in total variation is O(ε1/2)O(\varepsilon^{1/2}), and practical computation leverages closed-form ODE recursion for mean and variance (Sanz-Alonso et al., 2016). Similar advantages apply to master equations for Markov jump processes, where Gaussian closure reduces mean error from O(Ω0)O(\Omega^0) (van Kampen) to O(Ω1/2)O(\Omega^{-1/2}) for system size Ω\Omega, with variance preserved at O(Ω1/2)O(\Omega^{1/2}) error across both methods (Lafuerza et al., 2010).

7. Approximation of Alpha–Stable and Non-Gaussian Laws

For α\alpha–stable distributions, the LePage series expansion yields a “truncation + Gaussian tail” approximation, minimizing Kolmogorov distance to O(1/c)O(1/c) where cc is the truncation level. This leads to sharply computable error bounds for inference: Δ(X,X^)B5(c,α,N)\Delta(X,\widehat{X}) \leq B_5(c,\alpha,N) where X^\widehat{X} denotes the truncated series plus matched Gaussian tail. This approach uniformly outperforms pure truncation and mixture-of-normals when α<2\alpha<2 (Riabiz et al., 2018).

8. Structural, Algorithmic, and Error Analysis Tools

Several algorithmic paradigms are now established:

  • Sum-of-exponentials rational approximations achieve near-optimal geometric error decay (O(7.5N)O(7.5^{-N}) for NN modes) for 1D Gaussian kernel transforms (Jiang, 2019).
  • Separable, area-matching plus weighted least squares fitting provides efficient and accurate Gaussian parameter estimation for sampled data, delivering closed-form σ\sigma and robust iterative schemes (Al-Nahhal et al., 2019).
  • NN-term Gaussian mixtures in L2(R2)L_2(\mathbb{R}^2) can universally match curvelet sparsity rates and are “universal” for anisotropic classes, via two-stage approximation and a Fourier-domain analysis that exploits vanishing moments and parabolic scaling (Erb et al., 2019).
  • Moment-matching and Gauss–Hermite quadrature crucially surpass naive truncation for compact approximation, reducing Laplace transform error from eΘ(a2)e^{-\Theta(a^2)} to eΘ(a2loga)e^{-\Theta(a^2\log a)}, which enables “super-flat” mixtures with uniformly bounded derivatives (Polyanskiy et al., 2020).

9. Outlook and Open Problems

Research is ongoing on explicit determination of constants in the mean and tail exponents for best approximation rates, extension to multidimensional and general location–scale mixtures (where moment–tensor complexity grows), and nonconvexity and mode-capture properties for nonuniqueness in infinite-dimensional KL minimization. Further connections to optimal transport rigidity, empirical process minimax bounds, information geometry, and scalable algorithms remain central for high-dimensional Bayesian inference, machine learning model compression, and functional data analysis.


Selected Table: Minimax Rates for Gaussian Mixture Approximation (Ma et al., 2024)

Mixing law class Min. m(ϵ)m^*(\epsilon) for error ϵ\leq \epsilon Typical application
[M,M][-M,M] (compact) log(1/ϵ)log(1+1/Mlog(1/ϵ))\frac{\log(1/\epsilon)}{\log(1+1/M \sqrt{\log(1/\epsilon)})} Signal constellations, quadrature
Xψ2σ\|X\|_{\psi_2}\leq\sigma (subgaussian) σlog(1/ϵ)\sigma \log(1/\epsilon) Channel noise, robust statistics
Xψ1β\|X\|_{\psi_1}\leq\beta (subexponential) β(log(1/ϵ))3/2\beta (\log(1/\epsilon))^{3/2} Heavy-tailed processes

This summary integrates the current state of theory, practical schemes, sharp bounds, and geometric insights for the best Gaussian approximation in measure, data, function, and process spaces.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Best Gaussian Approximation.