Approximate Stein Classes for Truncated Density Estimation (2306.00602v2)

Published 1 Jun 2023 in stat.ML, cs.LG, and stat.ME

Abstract: Estimating truncated density models is difficult, as these models have intractable normalising constants and hard to satisfy boundary conditions. Score matching can be adapted to solve the truncated density estimation problem, but requires a continuous weighting function which takes zero at the boundary and is positive elsewhere. Evaluation of such a weighting function (and its gradient) often requires a closed-form expression of the truncation boundary and finding a solution to a complicated optimisation problem. In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. We develop a novel discrepancy measure, truncated kernelised Stein discrepancy (TKSD), which does not require fixing a weighting function in advance, and can be evaluated using only samples on the boundary. We estimate a truncated density model by minimising the Lagrangian dual of TKSD. Finally, experiments show the accuracy of our method to be an improvement over previous works even without the explicit functional form of the boundary.

Summary

The paper introduces approximate Stein classes by relaxing Stein’s identity to enable robust truncated density estimation under unknown boundaries.
It develops the Truncated Kernelised Stein Discrepancy (TKSD), providing a closed-form solution that efficiently separates kernelized divergence from truncation effects.
Extensive empirical evaluations confirm the method’s consistency, computational efficiency, and superiority over traditional truncated estimation techniques.

A Formal Analysis of Approximate Stein Classes for Truncated Density Estimation

The paper presents a novel approach to the problem of estimating truncated density models, specifically focusing on scenarios where the boundary of the domain is unknown. Traditional statistical methods like maximum likelihood estimation (MLE) often fail in such settings due to the intractability of normalizing constants and specific boundary conditions that are difficult to satisfy. This paper introduces the concept of approximate Stein classes, which provide a framework for developing flexible density estimation techniques in truncated settings.

Key Contributions

Approximate Stein Classes: The authors propose an adaptation of the classical Stein class, introducing approximate Stein classes, which relax the strict requirement that Stein's identity hold exactly. Instead, they allow it to be satisfied approximately, defined as bounded by a decreasing sequence in probability. This relaxation facilitates practical applicability in settings where precise boundary conditions are difficult to enforce.
Truncated Kernelised Stein Discrepancy (TKSD): Building upon the approximate Stein classes, the paper develops TKSD, a new discrepancy measure for truncated density estimation. Unlike prior methods, TKSD does not require a predefined weighting function and is computationally feasible using only random samples from the boundary, avoiding explicit boundary functional form requirements.
Analytic Solution for TKSD: The authors provide a closed-form solution for the TKSD measure, using a Lagrangian dual approach to solve for the optimization constraints imposed by the approximate Stein class. This solution efficiently separates the terms involving kernelized Stein divergence analogous to classical KSD and those accounting for the truncated nature of the dataset.
Empirical Evaluation: The paper conducts extensive empirical evaluations demonstrating the robustness and accuracy of TKSD relative to traditional methods like Truncated Score Matching (TruncSM) and bounded-domain KSD (bd-KSD). Notably, TKSD maintains performance advantages even with fewer boundary points, showcasing its computational efficiency and practical utility in high-dimensional spaces.

Theoretical Insights and Implications

The introduction of approximate Stein classes expands the theoretical landscape of density estimation by allowing the Stein identity's strict boundaries to be more flexibly applied. This represents a significant shift for statistical modelling in truncated datasets, providing a consistent and practical estimator devoid of the burdensome expectation of known functional boundaries.

Consistency and Convergence

The authors rigorously prove the consistency of their estimator. They demonstrate that the parameter estimation error bounds decrease asymptotically as the sample size and boundary point approximation improves. This empirical consistency is verified through experiments that exhibit convergence of the estimator to the true parameter values as both the number of data samples and boundary points increase.

Future Directions

The paper opens several potential avenues for future research:

Generalization to Broader Classes of Truncation: While the focus is on densities truncated by unknown boundaries, extending these methods to handle more complex, possibly dynamic boundaries would be beneficial.
Integration with High-Dimensional Models: The integration of TKSD with high-dimensional Bayesian models and complex latent variable frameworks could expand its applicability in fields like genetics or financial modelling, where data is often naturally truncated.
Computational Optimization: Further exploration into optimizing the computational aspects associated with larger kernel matrices and boundary point calculations could enhance the method's scalability.

In conclusion, this paper provides a thorough and substantial contribution to the field of truncated density estimation, introducing tools that balance theoretical rigor with practical application. Its innovative use of approximate Stein classes challenges existing paradigms and suggests a promising direction for future research in statistical modelling with truncated data.

PDF Markdown

Related Papers

Tweets

https://twitter.com/StatMLPapers/status/1779722359194222776

YouTube

Show All Videos