Markovian Sliced Wasserstein Distances: Beyond Independent Projections (2301.03749v3)
Abstract: Sliced Wasserstein (SW) distance suffers from redundant projections due to independent uniform random projecting directions. To partially overcome the issue, max K sliced Wasserstein (Max-K-SW) distance ($K\geq 1$), seeks the best discriminative orthogonal projecting directions. Despite being able to reduce the number of projections, the metricity of Max-K-SW cannot be guaranteed in practice due to the non-optimality of the optimization. Moreover, the orthogonality constraint is also computationally expensive and might not be effective. To address the problem, we introduce a new family of SW distances, named Markovian sliced Wasserstein (MSW) distance, which imposes a first-order Markov structure on projecting directions. We discuss various members of MSW by specifying the Markov structure including the prior distribution, the transition distribution, and the burning and thinning technique. Moreover, we investigate the theoretical properties of MSW including topological properties (metricity, weak convergence, and connection to other distances), statistical properties (sample complexity, and Monte Carlo estimation error), and computational properties (computational complexity and memory complexity). Finally, we compare MSW distances with previous SW variants in various applications such as gradient flows, color transfer, and deep generative modeling to demonstrate the favorable performance of MSW.
- Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Advances in Neural Information Processing Systems, pages 1964–1974, 2017.
- Sliced optimal partial transport. arXiv preprint arXiv:2212.08049, 2022.
- Measure theory, volume 1. Springer, 2007.
- Spherical sliced-wasserstein. arXiv preprint arXiv:2206.08780, 2022.
- Efficient gradient flows in sliced-Wasserstein space. Transactions on Machine Learning Research, 2022.
- N. Bonneel and D. Coeurjolly. Spot: sliced partial optimal transport. ACM Transactions on Graphics (TOG), 38(4):1–13, 2019.
- Sliced and Radon Wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 1(51):22–45, 2015.
- N. Bonnotte. Unidimensional and evolution methods for optimal transportation. PhD thesis, Paris 11, 2013.
- S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- Augmented sliced Wasserstein distances. International Conference on Learning Representations, 2022.
- M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013.
- B. Dai and U. Seljak. Sliced iterative normalizing flows. In International Conference on Machine Learning, pages 2352–2364. PMLR, 2021.
- Hyperspherical variational auto-encoders. In 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pages 856–865. Association For Uncertainty in Artificial Intelligence (AUAI), 2018.
- Max-sliced Wasserstein distance and its use for GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.
- Generative modeling using the sliced Wasserstein distance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3483–3491, 2018.
- A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media, 2013.
- Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2681–2690, 2019.
- Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021.
- N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162:707–738, 2015.
- Learning generative models with Sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pages 1608–1617. PMLR, 2018.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
- A Riemannian block coordinate descent method for computing the projection robust Wasserstein distance. In International Conference on Machine Learning, pages 4446–4455. PMLR, 2021.
- Maximum likelihood estimators for the matrix von Mises-Fisher and Bingham distributions. The Annals of Statistics, 7(3):599–606, 1979.
- O. Kallenberg and O. Kallenberg. Foundations of modern probability, volume 2. Springer, 1997.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Generalized sliced Wasserstein distances. In Advances in Neural Information Processing Systems, pages 261–272, 2019.
- Sliced Wasserstein auto-encoders. In International Conference on Learning Representations, 2018.
- Sliced Wasserstein distance for learning Gaussian mixture models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3427–3436, 2018.
- Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
- Sliced Wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10285–10295, 2019.
- Run-sort-rerun: Escaping batch size limitations in sliced Wasserstein generative models. In International Conference on Machine Learning, pages 6275–6285. PMLR, 2021.
- Projection robust Wasserstein distance and Riemannian optimization. Advances in Neural Information Processing Systems, 33:9383–9397, 2020.
- Fixed-support Wasserstein barycenters: Computational hardness and fast algorithm. In NeurIPS, pages 5368–5380, 2020.
- On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms. In International Conference on Machine Learning, pages 3982–3991, 2019.
- On the efficiency of entropic regularized algorithms for optimal transport. Journal of Machine Learning Research (JMLR), 23:1–42, 2022.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In International Conference on Machine Learning, pages 4104–4113. PMLR, 2019.
- Pooling by sliced-Wasserstein embedding. Advances in Neural Information Processing Systems, 34, 2021.
- Approximate Bayesian computation with the sliced-Wasserstein distance. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5470–5474. IEEE, 2020.
- Statistical and topological properties of sliced probability divergences. Advances in Neural Information Processing Systems, 33:20802–20812, 2020.
- Asymptotic guarantees for learning generative models with the sliced-Wasserstein distance. In Advances in Neural Information Processing Systems, pages 250–260, 2019.
- K. Nguyen and N. Ho. Amortized projection optimization for sliced Wasserstein generative models. Advances in Neural Information Processing Systems, 2022.
- K. Nguyen and N. Ho. Revisiting sliced Wasserstein on images: From vectorization to convolution. Advances in Neural Information Processing Systems, 2022.
- Distributional sliced-Wasserstein and applications to generative modeling. In International Conference on Learning Representations, 2021.
- Improving relational regularized autoencoders with spherical sliced fused Gromov-Wasserstein. In International Conference on Learning Representations, 2021.
- Statistical, robustness, and computational guarantees for sliced wasserstein distances. Advances in Neural Information Processing Systems, 2022.
- F.-P. Paty and M. Cuturi. Subspace robust Wasserstein distances. In International Conference on Machine Learning, pages 5072–5081, 2019.
- G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- G. Peyré and M. Cuturi. Computational optimal transport, 2020.
- Orthogonal estimation of Wasserstein distances. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 186–195. PMLR, 2019.
- Improved techniques for training GANs. Advances in Neural Information Processing Systems, 29, 2016.
- Improving GANs using optimal transport. In International Conference on Learning Representations, 2018.
- F. Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
- M. Sommerfeld and A. Munk. Inference for empirical wasserstein distances on finite spaces. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(1):219–238, 2018.
- S. Sra. Directional statistics in machine learning: a brief review. arXiv preprint arXiv:1605.00316, 2016.
- N. M. Temme. Special functions: An introduction to the classical functions of mathematical physics. John Wiley & Sons, 2011.
- C. Villani. Optimal transport: Old and New. Springer, 2008.
- C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
- M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press, 2019.
- Sliced Wasserstein generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3713–3722, 2019.
- M. Yi and S. Liu. Sliced Wasserstein variational inference. In Fourth Symposium on Advances in Approximate Bayesian Inference, 2021.