Comparing representations of high-dimensional data with persistent homology: a case study in neuroimaging (2306.13802v2)
Abstract: Despite much attention, the comparison of reduced-dimension representations of high-dimensional data remains a challenging problem in multiple fields, especially when representations remain high-dimensional compared to sample size. We offer a framework for evaluating the topological similarity of high-dimensional representations of very high-dimensional data, a regime where topological structure is more likely captured in the distribution of topological "noise" than a few prominent generators. Treating each representational map as a metric embedding, we compute the Vietoris-Rips persistence of its image. We then use the topological bootstrap to analyze the re-sampling stability of each representation, assigning a "prevalence score" for each nontrivial basis element of its persistence module. Finally, we compare the persistent homology of representations using a prevalence-weighted variant of the Wasserstein distance. Notably, our method is able to compare representations derived from different samples of the same distribution and, in particular, is not restricted to comparisons of graphs on the same vertex set. In addition, representations need not lie in the same metric space. We apply this analysis to a cross-sectional sample of representations of functional neuroimaging data in a large cohort and hierarchically cluster under the prevalence-weighted Wasserstein. We find that the ambient dimension of a representation is a stronger predictor of the number and stability of topological features than its decomposition rank. Our findings suggest that important topological information lies in repeatable, low-persistence homology generators, whose distributions capture important and interpretable differences between high-dimensional data representations.
- A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Inf. Sci. (Ny) 270, 1–27 (2014).
- Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72, 1431–1443 (2009).
- Two key properties of dimensionality reduction methods. In 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (IEEE, 2014).
- Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. The Journal of Machine Learning Research 22, 201:9129–201:9201 (2022).
- Amari, S.-i. Information Geometry and its Applications, vol. Volume 194 (2016). URL http://www.springer.com/series/34.
- Tangent space estimation for smooth embeddings of riemannian manifolds®. information and inference: A. Journal of the IMA 2, 69–114 (2013).
- Vector diffusion maps and the connection laplacian. Commun. Pure Appl. Math. 65, 1067–1144 (2012).
- Persistent homology for the evaluation of dimensionality reduction schemes. Comput. Graph. Forum 34, 431–440 (2015).
- A study on validating non-linear dimensionality reduction using persistent homology. Pattern Recognit. Lett. 100, 160–166 (2017).
- Representation Topology Divergence: A Method for Comparing Neural Network Representations (2022). URL http://arxiv.org/abs/2201.00058. ArXiv:2201.00058 [cs, math].
- Comparing Distance Metrics on Vectorized Persistence Summaries (2020).
- Functional summaries of persistence diagrams. J. Appl. Comput. Topol. 4, 211–262 (2020).
- Cycle registration in persistent homology with applications in topological bootstrap (2021).
- Fast topological signal identification and persistent cohomological cycle matching (2022).
- Dadi, K. et al. Benchmarking functional connectome-based predictive models for resting-state fMRI. NeuroImage 192, 115–134 (2019). Publisher: Academic Press Inc.
- Botvinik-Nezer, R. et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582, 84–88 (2020). URL https://doi.org/10.18112/openneuro. Publisher: Nature Research.
- Bijsterbosch, J. et al. Challenges and future directions for representations of functional brain organization. Nat. Neurosci. 23, 1484–1495 (2020).
- Trofimov, I. et al. Learning Topology-Preserving Data Representations (2023). URL http://arxiv.org/abs/2302.00136. ArXiv:2302.00136 [cs, math].
- Persistent homology detects curvature. Inverse Probl. 36, 025008 (2020).
- From geometry to topology: Inverse theorems for distributed persistence (2021). 2101.12288.
- Computational topology for data analysis (Cambridge University Press, Cambridge, England, 2022).
- Persistent homology for kernels, images, and cokernels. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, 1011–1020 (Society for Industrial and Applied Mathematics, Philadelphia, PA, 2009).
- Wasserstein Stability for Persistence Diagrams. arXiv:2006.16824 [math] (2021). URL http://arxiv.org/abs/2006.16824. ArXiv: 2006.16824.
- Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).
- Hatcher, A. Algebraic Topology (Cambridge University Press, 2002).
- Learning from Data (2012).
- Easley, T. Fork from inesgare/interval-matching. https://github.com/tyo8/interval-matching_bootstrap. Accessed: 2023-9-28.
- Bauer, U. Ripser: efficient computation of Vietoris–Rips persistence barcodes. J. Appl. Comput. Topol. 5, 391–423 (2021).
- Efficient computation of image persistence (2022).
- Glasser, M. F. et al. The human connectome project’s neuroimaging approach. Nat. Neurosci. 19, 1175–1187 (2016).
- Multi-level block permutation. Neuroimage 123, 253–268 (2015). URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4644991/.
- Yeo, B. T. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).
- Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016). URL https://www.nature.com/articles/nature18933. Number: 7615 Publisher: Nature Publishing Group.
- Schaefer, A. et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
- Comon, P. Independent component analysis, a new concept? Signal Processing 36, 287–314 (1994).
- Using dual regression to investigate network shape and amplitude in functional connectivity analyses. Front. Neurosci. 11, 115 (2017).
- Harrison, S. J. et al. Modelling subject variability in the spatial and temporal characteristics of functional modes. Neuroimage 222, 117226 (2020).
- Margulies, D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl. Acad. Sci. U. S. A. 113, 12574–12579 (2016).
- Moakher, M. A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 26, 735–747 (2005).
- A riemannian framework for tensor computing. Int. J. Comput. Vis. 66, 41–66 (2006).
- Approximate joint diagonalization and geometric mean of symmetric positive definite matrices (2015). 1505.07343.
- Fisher, R. A. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10, 507 (1915).
- Easley, T. Persistent homology and brain representations repository. https://github.com/tyo8/brain_representations. Accessed: 2023-9-28.
- Bijsterbosch, J. D. et al. The relationship between spatial configuration and functional connectivity of brain regions. eLife (2018). URL https://doi.org/10.7554/eLife.32992.001.
- R, K. et al. Comparison Between Gradients and Parcellations for Functional Connectivity Prediction of Behavior. Neuroimage 120044 (2023). URL https://www.win.ox.ac.uk/publications/1335479.
- Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience 19, 1523–1536 (2016). URL https://www.nature.com/articles/nn.4393. Number: 11 Publisher: Nature Publishing Group.
- The relationship between spatial configuration and functional connectivity of brain regions revisited. eLife (2019).
- Lee, S. et al. Amplitudes of resting-state functional networks - investigation into their correlates and biophysical properties. NeuroImage 265, 119779 (2023).
- Sydnor, V. J. et al. Intrinsic activity development unfolds along a sensorimotor–association cortical axis in youth. Nature Neuroscience 26, 638–649 (2023). URL https://www.nature.com/articles/s41593-023-01282-y. Number: 4 Publisher: Nature Publishing Group.
- Anderson, K. M. et al. Convergent molecular, cellular, and cortical neuroimaging signatures of major depressive disorder. Proceedings of the National Academy of Sciences 117, 25138–25149 (2020). URL https://www.pnas.org/doi/full/10.1073/pnas.2008004117. Publisher: Proceedings of the National Academy of Sciences.
- Dutt, R. K. et al. Mental health in the UK Biobank: A roadmap to self-report measures and neuroimaging correlates. Human Brain Mapping 43, 816–832 (2022). URL https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.25690. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/hbm.25690.
- Pang, J. C. et al. Geometric constraints on human brain function (2023). URL https://www.biorxiv.org/content/10.1101/2022.10.04.510897v2. Pages: 2022.10.04.510897 Section: New Results.