Optimal Transport for Measures with Noisy Tree Metric (2310.13653v3)
Abstract: We study optimal transport (OT) problem for probability measures supported on a tree metric space. It is known that such OT problem (i.e., tree-Wasserstein (TW)) admits a closed-form expression, but depends fundamentally on the underlying tree structure over supports of input measures. In practice, the given tree structure may be, however, perturbed due to noisy or adversarial measurements. To mitigate this issue, we follow the max-min robust OT approach which considers the maximal possible distances between two input measures over an uncertainty set of tree metrics. In general, this approach is hard to compute, even for measures supported in one-dimensional space, due to its non-convexity and non-smoothness which hinders its practical applications, especially for large-scale settings. In this work, we propose novel uncertainty sets of tree metrics from the lens of edge deletion/addition which covers a diversity of tree structures in an elegant framework. Consequently, by building upon the proposed uncertainty sets, and leveraging the tree structure over supports, we show that the robust OT also admits a closed-form expression for a fast computation as its counterpart standard OT (i.e., TW). Furthermore, we demonstrate that the robust OT satisfies the metric property and is negative definite. We then exploit its negative definiteness to propose positive definite kernels and test them in several simulations on various real-world datasets on document classification and topological data analysis.
- Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(1):218–252, 2017.
- Averaging on the Bures-Wasserstein manifold: Dimension-free convergence of gradient descent. Advances in Neural Information Processing Systems, 2021.
- Structured optimal transport. In International Conference on Artificial Intelligence and Statistics, pp. 1771–1780. PMLR, 2018.
- Sublinear time algorithms for earth mover’s distance. Theory of Computing Systems, 48:428–442, 2011.
- Robust optimal transport with applications in generative modeling and domain adaptation. Advances in Neural Information Processing Systems, 33:12934–12944, 2020.
- Robust optimization, volume 28. Princeton university press, 2009.
- Harmonic analysis on semigroups. Springer-Verglag, New York, 1984.
- Theory and applications of robust optimization. SIAM review, 53(3):464–501, 2011.
- Statistical analysis of Wasserstein distributionally robust estimators. In Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications, pp. 227–254. INFORMS, 2021.
- Learning generative models across incomparable spaces. In International Conference on Machine Learning (ICML), volume 97, 2019.
- Proximal optimal transport modeling of population dynamics. In International Conference on Artificial Intelligence and Statistics, pp. 6511–6528. PMLR, 2022.
- Sliced Wasserstein kernel for persistence diagrams. In International conference on machine learning, pp. 1–10, 2017.
- Cuturi, M. Positivity and transportation. arXiv preprint arXiv:1209.2655, 2012.
- A kernel for time series based on global alignments. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, volume 2, pp. II–413. IEEE, 2007.
- Max-sliced Wasserstein distance and its use for GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10648–10656, 2019.
- A swiss army knife for minimax optimal transport. In International Conference on Machine Learning, pp. 2504–2513. PMLR, 2020.
- Robust solutions to least-squares problems with uncertain data. SIAM Journal on matrix analysis and applications, 18(4):1035–1064, 1997.
- On the complexity of the optimal transport problem with graph-structured cost. In International Conference on Artificial Intelligence and Statistics, pp. 9147–9165. PMLR, 2022.
- Unbalanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning, pp. 3186–3197. PMLR, 2021.
- Learning generative models with Sinkhorn divergences. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp. 1608–1617, 2018.
- Dynamic flows on curved space generated by labeled data. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3803–3811, 2023.
- Entropic optimal transport between (unbalanced) Gaussian measures has a closed form. In Advances in neural information processing systems, 2020.
- Robust control with structure perturbations. IEEE Transactions on Automatic Control, 33(1):68–78, 1988.
- Sliced wasserstein kernels for probability distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5258–5267, 2016.
- Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics, pp. 130–166. Informs, 2019.
- Kernel method for persistence diagrams via kernel embedding and weight factor. The Journal of Machine Learning Research, 18(1):6947–6987, 2017.
- Dynamical optimal transport on discrete surfaces. In SIGGRAPH Asia 2018 Technical Papers, pp. 250. ACM, 2018.
- Entropy partial transport with tree metrics: Theory and practice. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 of Proceedings of Machine Learning Research, pp. 3835–3843. PMLR, 2021.
- Tree-sliced variants of Wasserstein distances. In Advances in neural information processing systems, pp. 12283–12294, 2019.
- Flow-based alignment approaches for probability measures in different spaces. In International Conference on Artificial Intelligence and Statistics, pp. 3934–3942. PMLR, 2021a.
- Adversarial regression with doubly non-negative weighting matrices. Advances in Neural Information Processing Systems, 34, 2021b.
- Sobolev transport: A scalable metric for probability measures with graph metrics. In International Conference on Artificial Intelligence and Statistics, pp. 9844–9868. PMLR, 2022.
- Scalable unbalanced Sobolev transport for measures on a graph. In International Conference on Artificial Intelligence and Statistics, pp. 8521–8560. PMLR, 2023.
- Projection robust Wasserstein distance and riemannian optimization. Advances in neural information processing systems, 33:9383–9397, 2020.
- Entropy regularized optimal transport independence criterion. In International Conference on Artificial Intelligence and Statistics, pp. 11247–11279. PMLR, 2022.
- LSMI-Sinkhorn: Semi-supervised mutual information estimation with optimal transport. In European Conference on Machine Learning and Principles & Practice of Knowledge Discovery in Databases (ECML-PKDD), 2021.
- Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem. In Advances in Neural Information Processing Systems, pp. 4541–4551, 2019.
- Robust reinforcement learning. Advances in neural information processing systems, pp. 1061–1067, 2001.
- Outlier-robust optimal transport. In International Conference on Machine Learning, pp. 7850–7860. PMLR, 2021.
- Missing data imputation using optimal transport. In International Conference on Machine Learning, pp. 7130–7140. PMLR, 2020.
- Asymptotic guarantees for learning generative models with the sliced-Wasserstein distance. In Advances in Neural Information Processing Systems, pp. 250–260, 2019.
- On robust optimal transport: Computational complexity and barycenter computation. Advances in Neural Information Processing Systems, 34:21947–21959, 2021a.
- Energy-based sliced wasserstein distance. Advances in Neural Information Processing Systems, 2023.
- Point-set distances for learning representations of 3d point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10478–10487, 2021b.
- Many processors, little time: MCMC for partitions via optimal transport couplings. In International Conference on Artificial Intelligence and Statistics, pp. 3483–3514. PMLR, 2022.
- Optimal transport kernels for sequential and parallel neural architecture search. In International Conference on Machine Learning, pp. 8084–8095. PMLR, 2021c.
- Outlier-robust optimal transport: Duality, structure, and statistical analysis. In International Conference on Artificial Intelligence and Statistics, pp. 11691–11719. PMLR, 2022.
- Estimation of Wasserstein distances in the spiked transport model. Bernoulli, 28(4):2663–2688, 2022.
- Robust reinforcement learning using offline data. In Advances in Neural Information Processing Systems, volume 35, pp. 32211–32224, 2022.
- Subspace robust Wasserstein distances. In Proceedings of the 36th International Conference on Machine Learning, pp. 5072–5081, 2019.
- Regularity as regularization: Smooth and strongly convex Brenier potentials in optimal transport. In International Conference on Artificial Intelligence and Statistics, pp. 1222–1232. PMLR, 2020.
- Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- Scalable counterfactual distribution estimation in multivariate causal models. In Conference on Causal Learning and Reasoning (CLeaR), 2024.
- Wasserstein barycenter and its application to texture mixing. In International Conference on Scale Space and Variational Methods in Computer Vision, pp. 435–446, 2011.
- Bending graphs: Hierarchical shape matching using gated optimal transport. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11757–11767, 2022.
- Fast unbalanced optimal transport on tree. In Advances in neural information processing systems, 2020.
- Re-evaluating word mover’s distance. In International Conference on Machine Learning, pp. 19231–19249, 2022.
- Low-rank Sinkhorn factorization. International Conference on Machine Learning (ICML), 2021.
- Phylogenetics. Oxford Lecture Series in Mathematics and its Applications, 2003.
- Testing group fairness via optimal transport projections. International Conference on Machine Learning, 2021.
- Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG), 34(4):66, 2015.
- Tai, K.-C. The tree-to-tree correction problem. Journal of the ACM, 26(3):422–433, 1979.
- Fixed support tree-sliced Wasserstein barycenter. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151, pp. 1120–1137. PMLR, 2022.
- Optimal transport for structured data with application on graphs. In International Conference on Machine Learning, pp. 6275–6284. PMLR, 2019.
- Two-sample test with kernel projected Wasserstein distance. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151, pp. 8022–8055. PMLR, 2022.
- Estimation of smooth densities in Wasserstein distance. In Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pp. 3118–3119, 2019.
- Robust regression and lasso. Advances in neural information processing systems, 21, 2008.
- Robustness and regularization of support vector machines. Journal of machine learning research, 10(7), 2009.
- Approximating 1-Wasserstein distance with trees. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.
- An empirical study of simplicial representation learning with wasserstein distance. arXiv preprint arXiv:2310.10143, 2023.