Dice Question Streamline Icon: https://streamlinehq.com

KL/TV guarantees for randomized midpoint methods with sub-sqrt(d) gradient evaluations

Determine whether randomized midpoint algorithms for discretizing Langevin dynamics can achieve accuracy in either Kullback–Leibler divergence or total variation distance using o(d^{1/2}) total gradient evaluations when sampling from smooth targets (e.g., those considered by Shen–Lee 2019).

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper contrasts its fixed-grid Picard iteration approach with the randomized midpoint framework of Shen and Lee (2019). While the latter achieves 2-Wasserstein accuracy with \widetilde O(d{1/3}) gradient evaluations, the authors point out barriers to establishing stronger KL or TV guarantees for such randomized midpoint methods.

They explicitly note that, despite the success in W2, it remains unknown whether KL or TV accuracy can be obtained with fewer than \Theta(\sqrt d) gradient evaluations within the randomized midpoint paradigm. This uncertainty motivates the authors’ shift to a different parallelization strategy that enables KL/TV guarantees, albeit with different gradient complexity trade-offs.

References

Unfortunately, there seem to be fundamental barriers to obtaining KL or TV accuracy guarantees for randomized midpoint algorithms. To illustrate, while accuracy in $2$-Wasserstein distance can be achieved using $\widetilde O(d{1/3})$ gradient evaluations using a randomized midpoint algorithm Algorithm 1, accuracy in KL or TV distance using $o(d{1/2})$ gradient evaluations is not known.

Fast parallel sampling under isoperimetry (2401.09016 - Anari et al., 17 Jan 2024) in Section 1 (Introduction), Analysis techniques subsection