Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Canonical Variates in Wasserstein Metric Space (2405.15768v1)

Published 24 May 2024 in stat.ML, cs.AI, and cs.LG

Abstract: In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, we define both between-class and within-class variations as the average squared distances between pairs of instances, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. We conduct empirical studies to assess the algorithm's convergence and, through experimental validation, demonstrate that our dimension reduction technique substantially enhances classification performance. Moreover, our method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness against variations in the distributional representations of data clouds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Variable cellular responses to SARS-CoV-2 in fully vaccinated patients with multiple myeloma. Cancer Cell, 39(11):1442–1444.
  2. Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comp. (SJSC), 37(2):A1111–A1138.
  3. Sliced and radon wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51:22–45.
  4. Development, application and computational analysis of high-dimensional fluorescent antibody panels for single-cell flow cytometry. Nature Protocols, 14(7):1946–1969.
  5. Aggregated Wasserstein distance and state registration for hidden Markov models. IEEE transactions on pattern analysis and machine intelligence, 42(9):2133–2147.
  6. Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc.
  7. A Wasserstein-type distance in the space of Gaussian mixture models. SIAM Journal on Imaging Sciences, 13(2):936–970.
  8. Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10648–10656.
  9. Single-cell analysis reveals new evolutionary complexity in uveal melanoma. Nature Communications, 11(1):496.
  10. Wasserstein discriminant analysis. Machine Learning, 107:1923–1945.
  11. Apoptosis and other immune biomarkers predict influenza vaccine responsiveness. Molecular Systems Biology, 9(1):659.
  12. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Science Advances, 6(28):eaba1972.
  13. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.
  14. Generalized sliced wasserstein distances. Advances in neural information processing systems, 32.
  15. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology, 17(1):222.
  16. COMPASS identifies T-cell subsets correlated with clinical outcomes. Nature Biotechnology, 33(6):610–616.
  17. Multisource single-cell data integration by MAW barycenter for Gaussian mixture models. Biometrics, 79(2):866–877.
  18. Baseline immune profile by CyTOF can predict response to an investigational adjuvanted vaccine in elderly adults. Journal of Translational Medicine, 16(1):153.
  19. Subspace robust wasserstein distances. In International conference on machine learning, pages 5072–5081. PMLR.
  20. Fast and robust Earth Mover’s Distances. In 2009 IEEE 12th International Conference on Computer Vision, pages 460–467.
  21. Distance-based mixture modeling for classification via hypothetical local mapping. Statistical Analysis and Data Mining: The ASA Data Science Journal, 9(1):43–57.
  22. Circulating immune cell dynamics as outcome predictors for immunotherapy in non-small cell lung cancer. Journal for ImmunoTherapy of Cancer, 11(8).
  23. B cell characteristics at baseline predict vaccination response in RTX treated patients. Frontiers in Immunology, 13.
  24. Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Soc.
  25. Efficient discretization of optimal transport. Entropy, 25(6).
  26. A single-cell and spatially resolved atlas of human breast cancers. Nature Genetics, 53(9):1334–1347.
  27. A fast globally linearly convergent algorithm for the computation of wasserstein barycenters. Journal of Machine Learning Research, 22(21):1–37.
  28. Fast discrete distribution clustering using wasserstein barycenter with sparse support. IEEE Transactions on Signal Processing, 65(9):2317–2332.
  29. Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications, 37(3):1176–1197.
  30. Statistical and machine learning methods for immunoprofiling based on single-cell data. Human Vaccines & Immunotherapeutics, page 2234792.
  31. BSDE: barycenter single-cell differential expression for case–control studies. Bioinformatics, 38(10):2765–2772.
  32. IDEAS: individual level differential expression analysis for single-cell RNA-seq data. Genome Biology, 23(1):33.
Citations (1)

Summary

We haven't generated a summary for this paper yet.