Quiver Laplacians and Feature Selection (2404.06993v1)
Abstract: The challenge of selecting the most relevant features of a given dataset arises ubiquitously in data analysis and dimensionality reduction. However, features found to be of high importance for the entire dataset may not be relevant to subsets of interest, and vice versa. Given a feature selector and a fixed decomposition of the data into subsets, we describe a method for identifying selected features which are compatible with the decomposition into subsets. We achieve this by re-framing the problem of finding compatible features to one of finding sections of a suitable quiver representation. In order to approximate such sections, we then introduce a Laplacian operator for quiver representations valued in Hilbert spaces. We provide explicit bounds on how the spectrum of a quiver Laplacian changes when the representation and the underlying quiver are modified in certain natural ways. Finally, we apply this machinery to the study of peak-calling algorithms which measure chromatin accessibility in single-cell data. We demonstrate that eigenvectors of the associated quiver Laplacian yield locally and globally compatible features.
- Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 10(12):1213–1218, 2013.
- F. R. Chung. The Laplacian of a hypergraph. Expanding graphs, 10:21–36, 1992.
- F. R. Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997.
- G. Cooper and K. Adams. The cell: a molecular approach. Oxford University Press, 2022.
- Discrete Morse theory for computing cellular sheaf cohomology. Foundations of Computational Mathematics, 16:875–897, 2016.
- J. M. Curry. Sheaves, cosheaves and applications. University of Pennsylvania, 2014.
- F. Dorfler and F. Bullo. Synchronization of power networks: Network reduction and effective resistance. IFAC Proceedings Volumes, 43(19):197–202, 2010.
- S. Eilenberg and N. Steenrod. Foundations of algebraic topology, volume 2193. Princeton University Press, 2015.
- Integration of TP53, DREAM, MMB-FOXM1 and RB-E2F target gene analyses identifies cell cycle gene regulatory networks. Nucleic acids research, 44(13):6070–6086, 2016.
- The geometry of synchronization problems and learning group actions. Discrete & Computational Geometry, 65(1):150–211, 2021.
- R. Ghrist and H. Riess. Cellular sheaves of lattices and the Tarski Laplacian. arXiv preprint arXiv:2007.04099, 2020.
- T. E. Goldberg. Combinatorial Laplacians of Simplicial Complexes. Senior thesis, Bard College, 2002.
- Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 3rd ed edition, 1996.
- Spectral distances on graphs. Discrete Applied Mathematics, 190:56–74, 2015.
- Gene selection for cancer classification using support vector machines. Machine learning, 46:389–422, 2002.
- J. Hansen and R. Ghrist. Toward a Spectral Theory of Cellular Sheaves. Journal of Applied and Computational Topology, 3(4), 2019.
- A. N. Hirani. Discrete exterior calculus. California Institute of Technology, 2003.
- Matrix Analysis. Cambridge University Press, second edition, corrected reprint edition, 2017.
- I. T. Jolliffe. Principal Components Analysis, 2nd Ed. Springer, 2002.
- A sheaf theoretical approach to uncertainty quantification of heterogeneous geolocation information. Sensors, 20(12):3418, 2020.
- C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Press Office Los Angeles, CA, 1950.
- ARPACK-NG: Large scale eigenvalue problem solver. Astrophysics Source Code Library, pages ascl–2306, 2023.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- Multilinear hyperquiver representations. arXiv:2305.05622v2 [math.AG], 2023.
- Quiver signal processing (QSP). arXiv preprint arXiv:2010.11525, 2020.
- K. Pearson F.R.S. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
- C. Rébé and F. Ghiringhelli. STAT3, a master regulator of anti-tumor immune response. Cancers, 11(9):1280, 2019.
- Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nature Biotechnology, 37(8):925–936, 2019.
- R. Schiffler. Quiver Representations. Number 184 in CMS Books in Mathematics. Springer, 2014.
- Principal components along quiver representations. Foundations of Computational Mathematics, 23(4):1129–1165, 2023.
- R. L. Smith. Some interlacing properties of the Schur complement of a Hermitian matrix. Linear Algebra and its Applications, 177:137–144, 1992.
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
- From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1):5233, 2019.
- J. Van Den Heuvel. Hamilton cycles and eigenvalues of graphs. Linear algebra and its applications, 226:723–730, 1995.
- Deleting vertices and interlacing Laplacian eigenvalues. Chinese Annals of Mathematics, Series B, 31(2):231–236, 2010.
- From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis. Genome biology, 21:1–16, 2020.
- K. Ye and L.-H. Lim. Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications, 37(3):1176–1197, 2016.
- Single-cell ATAC-seq analysis via network refinement with peaks location information. bioRxiv, page 2022.11.18.517159, 2022.
- Model-based analysis of ChIP-Seq (MACS). Genome biology, 9(9):1–9, 2008.