Sliced Score Matching: A Scalable Approach to Density and Score Estimation (1905.07088v2)

Published 17 May 2019 in cs.LG and stat.ML

Abstract: Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.

Citations (349)

View on Semantic Scholar

Summary

The paper introduces Sliced Score Matching, a novel method that projects score differences onto random vectors to avoid computing complex Hessians.
It establishes theoretical guarantees including consistency and asymptotic normality, ensuring reliable parameter estimation as sample sizes increase.
Numerical experiments validate its efficiency and practical simplicity for training deep energy-based and generative models on high-dimensional data.

Sliced Score Matching: A Scalable Approach to Density and Score Estimation

The paper presents a novel methodology termed "Sliced Score Matching" as a sophisticated alternative for learning unnormalized statistical models, especially energy-based models, and for estimating scores related to implicit distributions. The core challenge this method addresses is the computational complexity traditionally associated with score matching due to the necessity of calculating the Hessian of log-density functions. Traditional score matching methods often falter with complex models or high-dimensional data because of these computational constraints.

Sliced Score Matching circumvents the computational overhead by projecting the score differences between the data and model distributions onto random vectors, effectively converting a multidimensional problem into several simpler one-dimensional problems. This transformation eliminates the need for direct computation of the Hessian by employing Hessian-vector products instead. These are efficiently computable using reverse-mode automatic differentiation available in modern machine learning libraries such as TensorFlow and PyTorch.

The theoretical contributions of the paper are significant. The authors prove the consistency and asymptotic normality of the sliced score matching estimator, providing a solid statistical foundation undergirding its utilization. These guarantees ensure that, as the sample size increases, the sliced score matching estimator converges to the true parameter values and that its distribution approaches a normal distribution, facilitating inference. This positions sliced score matching as a robust alternative to existing methods based on these statistical properties.

Numerical experiments conducted demonstrate sliced score matching's efficacy. Debates around alternative scalable methods such as Vincent's denoising score matching, which shifts the problem into the space of noise-corrupted distributions, highlight the broader computational and practical advantages of the proposed method. Specifically, sliced score matching does not require choosing auxiliary parameters or working with the complexities involved in other approximation techniques like curvature propagation or approximate backpropagation.

Practically, the implementation simplicity and reduced computational demand make it a very appealing choice for practitioners. The ability to train deep energy-based models, such as deep kernel exponential families and NICE flow-based models, validates its scalability. Moreover, its utility extends beyond density estimation; it serves as a potent tool in score estimation for implicit distributions. This is crucial for performing variational inference in models with implicit components and in training generative models like Wasserstein Auto-Encoders, both of which were shown to benefit substantively from leveraging more accurate score estimates provided by sliced score matching.

In terms of implications, sliced score matching opens up pathways for broader adoption in machine learning frameworks, particularly in tasks requiring efficient handling of unnormalized models. Furthermore, its integration within a variety of AI models suggests potential advancements in learning from complex, high-dimensional datasets without the traditional computational burdens.

Future work might explore deeper theoretical analysis or the development of tailored algorithms within this framework to address specific model structures or data types. Additionally, exploring further optimization and extending the conceptual framework to new kinds of statistical models would be areas ripe for further research. The paper lays solid groundwork for these extensions, making Sliced Score Matching a compelling contribution to scalable density and score estimation in modern statistical modeling.

PDF Markdown

Related Papers

YouTube

Show All Videos