A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation (1602.03253v2)

Published 10 Feb 2016 in stat.ML

Abstract: We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly.

Citations (461)

View on Semantic Scholar

Summary

The paper presents a novel kernelized Stein discrepancy leveraging RKHS and Stein's identity to measure differences between probability distributions.
It provides a likelihood-free approach for goodness-of-fit testing, ideal for evaluating models with intractable normalizing constants.
Empirical and theoretical analyses confirm the method's robustness, making it a valuable tool for validating deep and graphical models.

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

The paper "A Kernelized Stein Discrepancy for Goodness-of-fit Tests" introduces a novel approach for evaluating the goodness-of-fit of probabilistic models, particularly when dealing with complex and high-dimensional distributions. The primary contribution is the derivation of a Kernelized Stein Discrepancy (KSD) that facilitates goodness-of-fit testing without relying on likelihood calculations, which are often computationally prohibitive.

Key Contributions

Novel Discrepancy Measure: The authors develop a discrepancy statistic leveraging Stein's identity coupled with reproducing kernel Hilbert spaces (RKHS). This approach provides a robust framework for measuring differences between probability distributions.
Likelihood-free Testing: Traditional goodness-of-fit tests depend on likelihood-based methods, which are infeasible for models with intractable normalizing constants. The proposed method offers a likelihood-free alternative by using score functions that do not require normalization constants. This is particularly advantageous for evaluating models like large graphical or deep generative models.
Theoretical and Empirical Analysis: The paper presents a comprehensive paper of both the theoretical underpinnings and empirical performance of the proposed method. The results affirm the applicability of the KSD method to complex models that challenge conventional goodness-of-fit tests such as the chi-square or Kolmogorov-Smirnov tests.

Technical Details

Stein's Method: The paper capitalizes on Stein's method, which provides bounds for distributional distances. The authors incorporate this method into an RKHS framework, allowing for a flexible and computationally efficient way to define the Stein discrepancy.
Reproducing Kernel Hilbert Spaces: By defining the discrepancy within an RKHS, the authors ensure the ability to empirically estimate the discrepancy using U-statistics. This is crucial for providing statistically significant results without exploring likelihood computations.
Empirical Estimation: Given a sample from a distribution $p(x)$ , the Stein discrepancy is estimated through U-statistics, which utilize the score functions for efficiently capturing discrepancy between $p(x)$ and the model distribution $q(x)$ .

Implications and Future Directions

The proposed kernelized Stein discrepancy opens new avenues for model evaluation in complex settings where traditional methods falter. By mitigating the reliance on likelihood calculations, this method paves the way for more robust model assessments, particularly in machine learning applications involving deep or graphical models.

Theoretical implications include establishing a new kind of distance measure between distributions, providing insights into the convergence and adaptability of statistical models in machine learning. Practically, the methodology can improve model validation processes across various applications, from Bayesian inference to generative modeling.

Future research could focus on extending this method to composite hypothesis testing, exploring connections with other distance measures like Fisher divergence, and comparative studies against state-of-the-art approaches in machine learning model evaluation.

Conclusion

This paper provides a significant advancement in the domain of goodness-of-fit testing, offering a computational tractable method aligned with the complexities of modern probabilistic models. The introduction of Kernelized Stein Discrepancy not only advances theoretical knowledge but also equips practitioners with a practical tool for model validation in challenging environments.

PDF Markdown