Graph Integration for Diffusion-Based Manifold Alignment (2410.22978v1)
Abstract: Data from individual observations can originate from various sources or modalities but are often intrinsically linked. Multimodal data integration can enrich information content compared to single-source data. Manifold alignment is a form of data integration that seeks a shared, underlying low-dimensional representation of multiple data sources that emphasizes similarities between alternative representations of the same entities. Semi-supervised manifold alignment relies on partially known correspondences between domains, either through shared features or through other known associations. In this paper, we introduce two semi-supervised manifold alignment methods. The first method, Shortest Paths on the Union of Domains (SPUD), forms a unified graph structure using known correspondences to establish graph edges. By learning inter-domain geodesic distances, SPUD creates a global, multi-domain structure. The second method, MASH (Manifold Alignment via Stochastic Hopping), learns local geometry within each domain and forms a joint diffusion operator using known correspondences to iteratively learn new inter-domain correspondences through a random-walk approach. Through the diffusion process, MASH forms a coupling matrix that links heterogeneous domains into a unified structure. We compare SPUD and MASH with existing semi-supervised manifold alignment methods and show that they outperform competing methods in aligning true correspondences and cross-domain classification. In addition, we show how these methods can be applied to transfer label information between domains.
- K. M. Boehm, E. A. Aherne, L. Ellenson et al., “Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer,” Nature Cancer, vol. 3, no. 6, pp. 723–733, Jun 2022. [Online]. Available: https://doi.org/10.1038/s43018-022-00388-9
- P. Koehn, “Europarl: A parallel corpus for statistical machine translation,” in Proceedings of Machine Translation Summit X: Papers, Phuket, Thailand, Sep. 13-15 2005, pp. 79–86. [Online]. Available: https://aclanthology.org/2005.mtsummit-papers.11
- T. Meng, X. Jing, Z. Yan et al., “A survey on machine learning for data fusion,” Information Fusion, vol. 57, pp. 115–129, 2020.
- A. J. Izenman, “Introduction to manifold learning,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 4, no. 5, pp. 439–446, 2012.
- J. D. Welch, A. J. Hartemink, and J. F. Prins, “Matcher: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics,” Genome Biology, vol. 18, no. 1, p. 138, Jul 2017. [Online]. Available: https://doi.org/10.1186/s13059-017-1269-0
- M. Amodio and S. Krishnaswamy, “Magan: Aligning biological manifolds,” in International Conference on Machine Learning, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:3303339
- S. Lafon, Y. Keller, and R. Coifman, “Data fusion and multicue data matching by diffusion maps,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1784–1797, 2006.
- A. Nguyen, L. E. Richards, G. Y. Kebe et al., “Practical cross-modal manifold alignment for grounded language,” ArXiv, vol. abs/2009.05147, 2020.
- C. Wang, P. Krafft, and S. Mahadevan, “Manifold alignment,” in Manifold Learning: Theory and Applications, Y. Ma and Y. Fu, Eds. CRC Press, 2011.
- C. Wang and S. Mahadevan, “Manifold alignment without correspondence,” in International Joint Conference on Artificial Intelligence, 2009. [Online]. Available: https://api.semanticscholar.org/CorpusID:59769929
- Z. Cui, H. Chang, S. Shan et al., “Generalized unsupervised manifold alignment,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes et al., Eds., vol. 27. Curran Associates, Inc., 2014.
- J. S. Stanley, S. Gigante, G. Wolf et al., “Harmonic alignment,” in Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), 2020, pp. 316–324. [Online]. Available: https://epubs.siam.org/doi/pdf/10.1137/1.9781611976236.36
- O. Lindenbaum, A. Yeredor, M. Salhov et al., “Multi-view diffusion maps,” Information Fusion, vol. 55, pp. 127–149, 2020.
- C. Wang and S. Mahadevan, “Heterogeneous domain adaptation using manifold alignment,” in International Joint Conference on Artificial Intelligence, 2011.
- D. Tuia and G. Camps-Valls, “Kernel manifold alignment for domain adaptation,” PLoS One, vol. 11, no. 2, p. e0148655, Feb. 2016.
- A. F. Duque Correa, M. Lizotte, G. Wolf et al., “Manifold alignment with label information,” in 2023 International Conference on Sampling Theory and Applications (SampTA), 2023, pp. 1–6.
- J. Ham, D. Lee, and L. Saul, “Semisupervised alignment of manifolds,” in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, R. G. Cowell and Z. Ghahramani, Eds., vol. R5. PMLR, 06–08 Jan 2005, pp. 120–127, reissued by PMLR on 30 March 2021. [Online]. Available: https://proceedings.mlr.press/r5/ham05a.html
- C. Wang and S. Mahadevan, “Manifold alignment using procrustes analysis,” in Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08. New York, NY, USA: Association for Computing Machinery, 2008, p. 1120–1127. [Online]. Available: https://doi.org/10.1145/1390156.1390297
- J. B. Tenenbaum, V. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. [Online]. Available: https://doi.org/10.1126/science.290.5500.2319
- R. R. Coifman and S. Lafon, “Diffusion maps,” Appl. Comput. Harmon. Anal., vol. 21, no. 1, pp. 5–30, 2006, special Issue: Diffusion Maps and Wavelets. [Online]. Available: https://doi.org/10.1016/j.acha.2006.04.006
- M. Balasubramanian and E. L. Schwartz, “The isomap algorithm and topological stability,” Science, vol. 295, no. 5552, pp. 7–7, 2002.
- A. W. Fitzgibbon, “Robust registration of 2d and 3d point sets,” Image Vis. Comput., vol. 21, pp. 1145–1153, 2003. [Online]. Available: https://api.semanticscholar.org/CorpusID:7576794
- H. Wolfson and I. Rigoutsos, “Geometric hashing: an overview,” IEEE Computational Science and Engineering, vol. 4, no. 4, pp. 10–21, 1997.
- M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural computation, vol. 15, no. 6, pp. 1373–1396, 2003.
- C. Shen, J. T. Vogelstein, and C. E. Priebe, “Manifold matching using shortest-path distance and joint neighborhood selection,” Pattern Recognition Letters, vol. 92, pp. 41–48, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S016786551730106X
- A. F. Duque, G. Wolf, and K. R. Moon, “Diffusion transport alignment,” in Advances in Intelligent Data Analysis XXI, B. Crémilleux, S. Hess, and S. Nijssen, Eds. Cham: Springer Nature Switzerland, 2023, pp. 116–129.
- N. Courty, R. Flamary, and D. Tuia, “Domain adaptation with regularized optimal transport,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I 14. Springer, 2014, pp. 274–289.
- K. R. Moon, D. van Dijk et al., “Visualizing structure and transitions in high-dimensional biological data,” Nat. Biotechnol., vol. 37, no. 12, pp. 1482–1492, Dec 2019. [Online]. Available: https://doi.org/10.1038/s41587-019-0336-3
- P. Demetci, R. Santorella, B. Sandstede et al., “Scot: single-cell multi-omics alignment with optimal transport,” Journal of computational biology, vol. 29, no. 1, pp. 3–18, 2022.
- A. F. Duque, G. Wolf, and K. R. Moon, “Visualizing high dimensional dynamical processes,” in 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), 2019, pp. 1–6.
- J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, no. 1, pp. 1–27, Mar 1964. [Online]. Available: https://doi.org/10.1007/BF02289565
- S. Lafon and A. Lee, “Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1393–1403, 2006.
- D. Dua and C. Graff, “Uci machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- S. S. Brar, “Heart attack dataset,” Mar 2024. [Online]. Available: https://www.kaggle.com/datasets/sukhmandeepsinghbrar/heart-attack-dataset?resource=download
- T. C. Frank E. Harrell Jr. (2017, oct) Titanic dataset. [Online]. Available: https://www.openml.org/d/40945
- N. Ck, “Water probability,” Apr 2024. [Online]. Available: https://www.kaggle.com/datasets/nayanack/water-probability
- J. Liu, Y. Huang, R. Singh et al., “Jointly embedding multiple Single-Cell omics measurements,” Algorithms Bioinform, vol. 143, Sep. 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.