Two-sample Test using Projected Wasserstein Distance (2010.11970v4)
Abstract: We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize the theoretical property of the finite-sample convergence rate on IPMs and present practical algorithms for computing this metric. Numerical examples validate our theoretical results.
- V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Computing Surveys, vol. 41, no. 3, Sep. 2009.
- M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “Network anomaly detection: methods, systems and tools,” IEEE Communications Surveys &\&& Tutorials, vol. 16, no. 1, pp. 303–336, Jun. 2013.
- V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection for discrete sequences: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 5, pp. 823–839, Nov. 2010.
- Y. Xie, J. Huang, and R. Willett, “Change-point detection for high-dimensional time series with missing data,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 1, pp. 12–27, Dec. 2012.
- S. Li, Y. Xie, H. Dai, and L. Song, “M-statistic for kernel change-point detection,” in Advances in Neural Information Processing Systems, Dec. 2015, pp. 3366–3374.
- L. Xie and Y. Xie, “Sequential change detection by optimal weighted ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divergence,” IEEE Journal on Selected Areas in Information Theory, pp. 1–1, Apr. 2021.
- K. Borgwardt, A. Gretton, M. Rasch, H.-P. Kriegel, B. Schoelkopf, and A. Smola, “Integrating structured biological data by kernel maximum mean discrepancy,” Bioinformatics, vol. 22, pp. 49–57, Jul. 2006.
- P. Schober and T. Vetter, “Two-sample unpaired t tests in medical research,” Anesthesia and analgesia, vol. 129, p. 911, Oct. 2019.
- J. R. Lloyd and Z. Ghahramani, “Statistical model criticism using kernel two sample tests,” in Advances in Neural Information Processing Systems, vol. 28, Dec. 2015.
- K. Chwialkowski, H. Strathmann, and A. Gretton, “A kernel test of goodness of fit,” Proceedings of Machine Learning Research, vol. 48, pp. 2606–2615, Jun. 2016.
- B. Kim, R. Khanna, and O. O. Koyejo, “Examples are not enough, learn to criticize! criticism for interpretability,” in Advances in Neural Information Processing Systems, Dec. 2016, pp. 2280–2288.
- H. Hotelling, “The generalization of student’s ratio,” The Annals of Mathematical Statistics, vol. 2, pp. 360–378, Aug. 1931.
- J. Pfanzagl and O. Sheynin, “Studies in the history of probability and statistics xliv a forerunner of the t-distribution,” Biometrika, vol. 83, no. 4, pp. 891–898, Dec. 1996.
- F. J. M. Jr., “The kolmogorov-smirnov test for goodness of fit,” Journal of the American Statistical Association, vol. 46, no. 253, pp. 68–78, Apr. 1951.
- E. del Barrio, J. A. Cuesta-Albertos, C. Matrain, and J. M. Rodriguez-Rodriguez, “Tests of goodness of fit based on the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-wasserstein distance,” Annals of Statistics, vol. 27, no. 4, pp. 1230–1239, Aug. 1999.
- A. Ramdas, N. G. Trillos, and M. Cuturi, “On wasserstein two-sample testing and related families of nonparametric tests,” Entropy, vol. 19, no. 2, Jan. 2017.
- A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two-sample test,” Journal of Machine Learning Research, vol. 13, pp. 723–773, Mar. 2012.
- A. Gretton, K. Fukumizu, Z. Harchaoui, and B. K. Sriperumbudur, “A fast, consistent kernel two-sample test,” in Advances in Neural Information Processing Systems, Dec. 2009, pp. 673–681.
- A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K. Sriperumbudur, “Optimal kernel choice for large-scale two-sample tests,” in Advances in Neural Information Processing Systems, Dec. 2012, pp. 1205–1213.
- A. Ramdas, S. J. Reddi, B. Póczos, A. Singh, and L. Wasserman, “On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions,” in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Jan. 2015.
- J. Bigot, E. Cazelles, and N. Papadakis, “Central limit theorems for sinkhorn divergence between probability distributions on finite spaces and statistical applications,” Electronic Journal of Statistics, Dec. 2017.
- T. Kanamori, T. Suzuki, and M. Sugiyama, “f𝑓fitalic_f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 708–720, Sep. 2011.
- I. Kim, A. Ramdas, A. Singh, and L. Wasserman, “Classification accuracy as a proxy for two-sample testing,” The Annals of Statistics, vol. 49, no. 1, pp. 411 – 434, Feb. 2021.
- S. Wei, C. Lee, L. Wichers, G. Li, and J. S. Marron, “Direction-projection-permutation for high dimensional hypothesis tests,” Journal of Computational and Graphical Statistics, vol. 25, no. 2, pp. 549–569, May 2016.
- A. K. Ghosh and M. Biswas, “Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes,” TEST, vol. 25, no. 3, pp. 525–547, Dec. 2015.
- J. W. Mueller and T. Jaakkola, “Principal differences analysis: Interpretable characterization of differences between distributions,” in Advances in Neural Information Processing Systems, vol. 28, Dec. 2015.
- T. Lin, C. Fan, N. Ho, M. Cuturi, and M. Jordan, “Projection robust wasserstein distance and riemannian optimization,” in Advances in Neural Information Processing Systems, vol. 33, Dec. 2020, pp. 9383–9397.
- T. Lin, Z. Zheng, E. Chen, M. Cuturi, and M. Jordan, “On projection robust optimal transport: Sample complexity and model misspecification,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, vol. 130, Apr. 2021, pp. 262–270.
- M. Huang, S. Ma, and L. Lai, “A riemannian block coordinate descent method for computing the projection robust wasserstein distance,” arXiv preprint arXiv:2012.05199, 2021.
- M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Advances in Neural Information Processing Systems, Dec. 2013, p. 2292–2300.
- A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyre, “Sample complexity of sinkhorn divergences,” in Proceedings of the 22rd International Conference on Artificial Intelligence and Statistics, vol. 89, Apr. 2019, pp. 1574–1583.
- N. Bonneel, J. Rabin, G. PeyrA, and H. Pfister, “Sliced and radon wasserstein barycenters of measures,” Journal of Mathematical Imaging and Vision, vol. 51, Apr. 2014.
- I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing, “Max-sliced wasserstein distance and its use for gans,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 10 640–10 648.
- T. Manole, S. Balakrishnan, and L. Wasserman, “Minimax confidence intervals for the sliced wasserstein distance,” arXiv preprint arXiv:1909.07862, 2019.
- P. Zhang, Q. Liu, D. Zhou, T. Xu, and X. He, “On the discrimination-generalization tradeoff in gans,” in 6th International Conference on Learning Representations, ICLR, Feb. 2018.
- U. v. Luxburg and O. Bousquet, “Distance-based classification with lipschitz functions,” Journal of Machine Learning Research, vol. 5, pp. 669–695, Jun. 2004.
- F.-P. Paty and M. Cuturi, “Subspace robust Wasserstein distances,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97, Jun. 2019, pp. 5072–5081.
- A. Maurer, “A vector-contraction inequality for rademacher complexities,” in Algorithmic Learning Theory. Springer International Publishing, Sep. 2016, pp. 3–17.