Universal Neural Optimal Transport (2212.00133v6)
Abstract: Optimal Transport (OT) problems are a cornerstone of many applications, but solving them is computationally expensive. To address this problem, we propose UNOT (Universal Neural Optimal Transport), a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function. UNOT builds on Fourier Neural Operators, a universal class of neural networks that map between function spaces and that are discretization-invariant, which enables our network to process measures of variable resolutions. The network is trained adversarially using a second, generating network and a self-supervised bootstrapping loss. We ground UNOT in an extensive theoretical framework. Through experiments on Euclidean and non-Euclidean domains, we show that our network not only accurately predicts OT distances and plans across a wide range of datasets, but also captures the geometry of the Wasserstein space correctly. Furthermore, we show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4\times$, significantly outperforming existing approaches.
- Dataset Dynamics via Gradient Flows in Probability Space. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 219–230. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/alvarez-melis21a.html.
- Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=dpOYN7o8Jm.
- Input Convex Neural Networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 146–155. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/amos17b.html.
- Meta Optimal Transport, 2022. URL https://arxiv.org/abs/2206.05262.
- Wasserstein Generative Adversarial Networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 214–223. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/arjovsky17a.html.
- Invertible Residual Networks. Proceedings of the International Conference on Machine Learning, 2019. doi: 10.48550/ARXIV.1811.00995. URL https://arxiv.org/abs/1811.00995.
- A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84:375–393, 2000.
- Supervised Training of Conditional Monge Maps, 2022a. URL https://arxiv.org/abs/2206.14262.
- Proximal Optimal Transport Modeling of Population Dynamics. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp. 6511–6528. PMLR, 28–30 Mar 2022b. URL https://proceedings.mlr.press/v151/bunne22a.html.
- Lipschitz continuity of the schrödinger map in entropic optimal transport, 2022.
- Reversible architectures for arbitrarily deep residual neural networks, 2017.
- Joint distribution optimal transportation for domain adaptation. In Advances in Neural Information Processing Systems, volume 30, 2017. URL https://proceedings.neurips.cc/paper/2017/file/0070d23b06b1486a538c0eaa45dd167a-Paper.pdf.
- Learning Wasserstein Embeddings. In ICLR 2018 - 6th International Conference on Learning Representations, pp. 1–13, Vancouver, Canada, April 2018. URL https://hal.inria.fr/hal-01956306.
- Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine, 35(1):53–65, jan 2018. doi: 10.1109/msp.2017.2765202. URL https://doi.org/10.1109%2Fmsp.2017.2765202.
- Cuturi, M. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/af21d0c97db2e27e13572cbf59eb343d-Paper.pdf.
- Primal Wasserstein Imitation Learning, 2020. URL https://arxiv.org/abs/2006.04678.
- NICE: Non-linear Independent Components Estimation. In ICLR Workshop, 2015.
- Seismis imaging and optimal transport. Communications in Information and Systems, 19(2):95–145, 2019. URL https://www.intlpress.com/site/pub/pages/journals/items/cis/content/vols/0019/0002/a001/index.php.
- Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016. URL https://www.ams.org/journals/jams/2016-29-04/S0894-0347-2016-00852-4/.
- On the scaling of multidimensional matrices. Linear Algebra and its Applications, 114-115:717–735, mar-apr 1989. URL https://doi.org/10.1016/0024-3795(89)90490-4.
- Learning with a Wasserstein Loss. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper/2015/file/a9eb812238f753132652ae09963a05e9-Paper.pdf.
- Generative Adversarial Nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
- Regularisation of neural networks by enforcing lipschitz continuity, 2020.
- GeONet: a neural operator for learning the Wasserstein geodesic, 2022. URL https://arxiv.org/abs/2209.14440.
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. doi: 10.1109/CVPR.2016.90.
- Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2722–2730. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/ho19a.html.
- i-revnet: Deep invertible networks, 2018.
- Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, 2014. doi: 10.48550/ARXIV.1312.6114. URL https://arxiv.org/abs/1312.6114.
- An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning, 12(4):307–392, 2019.
- Normalizing Flows: An Introduction and Review of Current Methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979, 2021. doi: 10.1109/TPAMI.2020.2992934.
- Optimal Mass Transport: Signal processing and machine-learning applications. IEEE Signal Processing Magazine, 34(4):43–59, 2017. doi: 10.1109/MSP.2017.2695801.
- Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research, 22(57):1–64, 2021. URL http://jmlr.org/papers/v22/19-1028.html.
- Computational Optimal Transport: With Applications to Data Science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019. ISSN 1935-8237. doi: 10.1561/2200000073. URL http://dx.doi.org/10.1561/2200000073.
- Variational inference with normalizing flows. In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 1530–1538, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/rezende15.html.
- Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell, 176(4):928–943, 2019.
- Wasserstein Dictionary Learning: Optimal Transport-Based Unsupervised Nonlinear Dictionary Learning. SIAM Journal on Imaging Sciences, 11(1):643–678, jan 2018. doi: 10.1137/17m1140431. URL https://doi.org/10.1137%2F17m1140431.
- On the explainable properties of 1-lipschitz neural networks: An optimal transport perspective, 2023.
- Concerning nonnegative Matrices and doubly stochastic Matrices. Pacific Journal of Mathematics, 21(2), 1967.
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 2256–2265, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/sohl-dickstein15.html.
- Generative Modeling by Estimating Gradients of the Data Distribution. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf.
- Rethinking Initialization of the Sinkhorn Algorithm, 2022. URL https://arxiv.org/abs/2206.07630.
- The monge gap: A regularizer to learn all transport maps, 2023.
- Villani, C. Optimal Transport Old and New. Springer, 2009.
- Generative adversarial networks: introduction and outlook. IEEE/CAA Journal of Automatica Sinica, 4(4):588–598, 2017. doi: 10.1109/JAS.2017.7510583.
- DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12200–12210, 2020. doi: 10.1109/CVPR42600.2020.01222.