Comparing Multivariate Distributions: A Novel Approach Using Optimal Transport-based Plots (2404.19700v1)
Abstract: Quantile-Quantile (Q-Q) plots are widely used for assessing the distributional similarity between two datasets. Traditionally, Q-Q plots are constructed for univariate distributions, making them less effective in capturing complex dependencies present in multivariate data. In this paper, we propose a novel approach for constructing multivariate Q-Q plots, which extend the traditional Q-Q plot methodology to handle high-dimensional data. Our approach utilizes optimal transport (OT) and entropy-regularized optimal transport (EOT) to align the empirical quantiles of the two datasets. Additionally, we introduce another technique based on OT and EOT potentials which can effectively compare two multivariate datasets. Through extensive simulations and real data examples, we demonstrate the effectiveness of our proposed approach in capturing multivariate dependencies and identifying distributional differences such as tail behaviour. We also propose two test statistics based on the Q-Q and potential plots to compare two distributions rigorously.
- Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Communications on Pure and Applied Mathematics, 44:375–417, 1991.
- Monge-kantorovich depth, quantiles, ranks and signs. Annals of Statistics, 45(1):223–256, 2017.
- M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transportation distances. Advances in Neural Information Processing Systems, 26:2292–2300, 2013.
- C. de Valk and J. Segers. Tails of optimal transport plans for regularly varying probability measures. arXiv preprint:1811.12061, 2018.
- Comparison of multivariate distributions using quantile–quantile plots and related tests. Bernoulli, 20(3):1484–1506, 2014.
- A multivariate generalization of quantile-quantile plots. Journal of the American Statistical Association, 85(410):376–386, 1990.
- Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021. URL http://jmlr.org/papers/v22/20-451.html.
- A. Genevay. Entropy-regularized optimal transport for machine learning. Ph.D. thesis, 2019. URL https://audeg.github.io/publications/these_aude.pdf.
- P. Ghosal and B. Sen. Multivariate ranks and quantiles using optimal transport: Consistency, rates and nonparametric testing. The Annals of Statistics, 50(2):1012–1037, 2022.
- Limit theorems for entropic optimal transport maps and the sinkhorn divergence. arXiv preprint arXiv:2207.08683, 2023.
- Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. Annals of Statistics, 49:1139–1165, 2021.
- J. C. Hütter and P. Rigollet. Minimax estimation of smooth optimal transport maps. The Annals of Statistics, 49(2):1166 – 1194, 2021.
- Central limit theorems for smooth optimal transport maps. arXiv:2312.12407, 2023.
- R. J. McCann. Existence and uniqueness of monotone measure-preserving maps. Duke Mathematical Journal, 80:309–323, 1995.
- M. Nutz. Introduction to entropic optimal transport. Lecture notes, available at https://www.math.columbia.edu/~mnutz/docs/EOT_lecture_notes.pdf, 2022.
- A.-A. Pooladian and J. Niles-Weed. Entropic estimation of optimal transport maps. arXiv:2109.12004, 2022.
- Stability and statistical inference for semidiscrete optimal transport maps. arXiv:2303.10155, 2023.
- F Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Springer, 2015.
- An analysis of variance test for normality: Complete samples. Biometrika, 52:591–611, 1965.
- From geometric quantiles to halfspace depths: A geometric approach for extremal behaviour. arXiv:2306.10789 & ESSEC WP2307, 2023.
- A. W. van der Vaart and J. A. Wellner. Weak Convergence. Springer New York, 1996.
- C. Villani. Topics in Optimal Transportation. American Mathematical Society, 2003.
- C. Villani. Optimal Transport Old and New. Springer, 2009.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.