Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Estimation Of Entropic Optimal Transport (2405.06734v1)

Published 10 May 2024 in math.ST and stat.TH

Abstract: Optimal transport (OT) serves as a natural framework for comparing probability measures, with applications in statistics, machine learning, and applied mathematics. Alas, statistical estimation and exact computation of the OT distances suffer from the curse of dimensionality. To circumvent these issues, entropic regularization has emerged as a remedy that enables parametric estimation rates via plug-in and efficient computation using Sinkhorn iterations. Motivated by further scaling up entropic OT (EOT) to data dimensions and sample sizes that appear in modern machine learning applications, we propose a novel neural estimation approach. Our estimator parametrizes a semi-dual representation of the EOT distance by a neural network, approximates expectations by sample means, and optimizes the resulting empirical objective over parameter space. We establish non-asymptotic error bounds on the EOT neural estimator of the cost and optimal plan. Our bounds characterize the effective error in terms of neural network size and the number of samples, revealing optimal scaling laws that guarantee parametric convergence. The bounds hold for compactly supported distributions and imply that the proposed estimator is minimax-rate optimal over that class. Numerical experiments validating our theory are also provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, pages 214–223, Sydney, Australia, Jul. 2017.
  2. Francis Bach. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1):629–681, 2017.
  3. Andrew R Barron. Neural net approximation. In Proc. 7th Yale workshop on adaptive and learning systems, volume 1, pages 69–72, 1992.
  4. Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
  5. Andrew R Barron. Approximation and estimation bounds for artificial neural networks. Machine learning, 14:115–133, 1994.
  6. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062, 2018.
  7. Sharp representation theorems for relu networks with precise dependence on depth. Advances in Neural Information Processing Systems, 33:10697–10706, 2020.
  8. Neural entropic estimation: A faster path to mutual information estimation. arXiv preprint arXiv:1905.12957, 2019.
  9. Vector quantile regression: an optimal transport approach. 2016.
  10. Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30, 2017.
  11. Monge–kantorovich depth, quantiles, ranks and signs. 2017.
  12. G Constantine and T Savits. A multivariate faa di bruno formula with applications. Transactions of the American Mathematical Society, 348(2):503–520, 1996.
  13. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  14. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems, 34:29736–29753, 2021.
  15. Lower bounds for the MMSE via neural network estimation and their applications to privacy. arXiv preprint arXiv:2108.12851, 2021.
  16. Score-based generative neural networks for large-scale optimal transport. Advances in neural information processing systems, 34:12955–12965, 2021.
  17. On the rate of convergence in wasserstein distance of the empirical measure. Probability theory and related fields, 162(3-4):707–738, 2015.
  18. Improved training of Wasserstein GANs. In Proceedings of the Annual Conference on Advances in Neural Information Processing Systems (NeurIPS-2017), pages 5767–5777, Long Beach, CA, US, Dec. 2017.
  19. Sample complexity of sinkhorn divergences. In The 22nd international conference on artificial intelligence and statistics, pages 1574–1583. PMLR, 2019.
  20. Lower complexity adaptation for empirical entropic optimal transport. arXiv preprint arXiv:2306.13580, 2023.
  21. Limit theorems for entropic optimal transport maps and the sinkhorn divergence. arXiv preprint arXiv:2207.08683, 2022.
  22. Limit theorems for entropic optimal transport maps and sinkhorn divergence. Electronic Journal of Statistics, 18(1):980–1041, 2024.
  23. Multivariate ranks and quantiles using optimal transport: Consistency, rates and nonparametric testing. The Annals of Statistics, 50(2):1012–1037, 2022.
  24. The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998.
  25. Entropic optimal transport between unbalanced gaussian measures has a closed form. Advances in neural information processing systems, 33:10468–10479, 2020.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  27. Entropic gromov-wasserstein between gaussian distributions. In International Conference on Machine Learning, pages 12164–12203. PMLR, 2022.
  28. Plugin estimation of smooth optimal transport maps. arXiv preprint arXiv:2107.12364, 2021.
  29. Neural estimator of information for time-series data with dependency. Entropy, 23(6):641, 2021.
  30. Energy-guided entropic neural optimal transport. arXiv preprint arXiv:2304.06094, 2023.
  31. Improved mutual information estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9009–9017, 2021.
  32. Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem. Advances in Neural Information Processing Systems, 32, 2019.
  33. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.
  34. Computational optimal transport. Center for Research in Economics and Statistics Working Papers, (2017-86), 2017.
  35. On variational lower bounds of mutual information. In NeurIPS Workshop on Bayesian Deep Learning, 2018.
  36. Neural Stein critics with staged l2superscript𝑙2l^{2}italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization. IEEE Transactions on Information Theory, 2023.
  37. Filippo Santambrogio. {{\{{Euclidean, metric, and Wasserstein}}\}} gradient flows: an overview. Bulletin of Mathematical Sciences, 7:87–154, 2017.
  38. Understanding the limitations of variational mutual information estimators. In International Conference on Learning Representations, 2019.
  39. Neural estimation of statistical divergences. Journal of Machine Learning Research, 23(126):1–75, 2022.
  40. Z. Zhang S. Sreekumar and Z. Goldfeld. Non-asymptotic performance guarantees for neural estimation of f𝑓fitalic_f-divergences. In International Conference on Artificial Intelligence and Statistics (AISTATS-2021), volume 130 of Proceedings of Machine Learning Research, pages 3322–3330, Virtual conference, April 2021.
  41. Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. arXiv preprint arXiv:1810.08033, 2018.
  42. Data-driven optimization of directed information over discrete alphabets. Accepted to the IEEE Transactions on Information theory, November 2023.
  43. Neural estimation and optimization of directed information over continuous spaces. IEEE Transactions on Information Theory, 2023.
  44. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558, 2017.
  45. Max-sliced mutual information. arXiv preprint arXiv:2309.16200, 2023.
  46. Nonparametric density estimation & convergence rates for gans under besov ipm losses. Advances in neural information processing systems, 32, 2019.
  47. Cédric Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
  48. Aad W van der Vaart and Jon A Wellner. Springer series in statistics. Weak convergence and empirical processesSpringer, New York, 1996.
  49. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, pages 1564–1599, 1999.
  50. Sup-norm approximation bounds for networks through probabilistic methods. IEEE Transactions on Information Theory, 41(4):1021–1027, 1995.
  51. Gromov-wasserstein distances: Entropic regularization, duality, and sample complexity. arXiv preprint arXiv:2212.12848, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com