Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 44 tok/s

Gemini 2.5 Pro 41 tok/s Pro

GPT-5 Medium 13 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 86 tok/s Pro

Kimi K2 208 tok/s Pro

GPT OSS 120B 447 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions (2509.04008v1)

Published 4 Sep 2025 in math.OC and stat.ML

Abstract: Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution, such as in Bayesian inference and other machine learning tasks. Different from other particle methods, SVGD does not require estimating the score, which is the gradient of the log-density. However, in practice, SVGD can be slow compared to score-estimation-based sampling algorithms. To design a fast and efficient high-dimensional sampling algorithm with the advantages of SVGD, we introduce accelerated SVGD (ASVGD), based on an accelerated gradient flow in a metric space of probability densities following Nesterov's method. We then derive a momentum-based discrete-time sampling algorithm, which evolves a set of particles deterministically. To stabilize the particles' position update, we also include a Wasserstein metric regularization. This paper extends the conference version \cite{SL2025}. For the bilinear kernel and Gaussian target distributions, we study the kernel parameter and damping parameters with an optimal convergence rate of the proposed dynamics. This is achieved by analyzing the linearized accelerated gradient flows at the equilibrium. Interestingly, the optimal parameter is a constant, which does not depend on the covariance of the target distribution. For the generalized kernel functions, such as the Gaussian kernel, numerical examples with varied target distributions demonstrate the effectiveness of ASVGD compared to SVGD and other popular sampling methods. Furthermore, we show that in the setting of Bayesian neural networks, ASVGD outperforms SVGD significantly in terms of log-likelihood and total iteration times.