A spectral method for multi-view subspace learning using the product of projections (2410.19125v1)
Abstract: Multi-view data provides complementary information on the same set of observations, with multi-omics and multimodal sensor data being common examples. Analyzing such data typically requires distinguishing between shared (joint) and unique (individual) signal subspaces from noisy, high-dimensional measurements. Despite many proposed methods, the conditions for reliably identifying joint and individual subspaces remain unclear. We rigorously quantify these conditions, which depend on the ratio of the signal rank to the ambient dimension, principal angles between true subspaces, and noise levels. Our approach characterizes how spectrum perturbations of the product of projection matrices, derived from each view's estimated subspaces, affect subspace separation. Using these insights, we provide an easy-to-use and scalable estimation algorithm. In particular, we employ rotational bootstrap and random matrix theory to partition the observed spectrum into joint, individual, and noise subspaces. Diagnostic plots visualize this partitioning, providing practical and interpretable insights into the estimation performance. In simulations, our method estimates joint and individual subspaces more accurately than existing approaches. Applications to multi-omics data from colorectal cancer patients and nutrigenomic study of mice demonstrate improved performance in downstream predictive tasks.
- Numerical methods for computing angles between linear subspaces. In Milestones in Matrix Computation.
- Large dimension forecasting models and random singular value spectra. The European Physical Journal B 55, 201–207.
- Angle-based joint and individual variation explained. Journal of multivariate analysis 166, 241–265.
- The optimal hard threshold for singular values is 4/sqrt(3). IEEE Transactions on Information Theory 60, 5040–5053.
- Structural learning and integrative decomposition of multi-view data. Biometrics 75, 1121–1132.
- Comments on: Data integration via analysis of subspaces (divas). TEST , 1–8.
- The consensus molecular subtypes of colorectal cancer. Nature medicine 21, 1350–1356.
- Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28(3-4), 321–377.
- Joint and individual variation explained (jive) for integrated analysis of multiple data types. The annals of applied statistics 7, 523.
- Novel aspects of pparα𝛼\alphaitalic_α-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology 45, 767–777.
- Interpretive jive: Connections with cca and an application to brain connectivity. Frontiers in Neuroscience 16, 969510.
- sjive: Supervised joint and individual variation explained. Computational statistics & data analysis 175, 107547.
- Integrative factorization of bidimensionally linked matrices. Biometrics 76, 61–74.
- Sparse common and distinctive covariates regression. Journal of Chemometrics 35, e3270.
- Data integration via analysis of subspaces (divas). TEST , 1–42.
- mixomics: An r package for ‘omics feature selection and multiple data integration. PLoS computational biology 13, e1005752.
- D-cca: A decomposition-based canonical correlation analysis for high-dimensional datasets. Journal of the American Statistical Association .
- False discovery and its control in low rank estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology 82, 997–1027.
- A structured overview of simultaneous component based data integration. BMC bioinformatics 10, 1–15.
- Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 .
- A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8.
- Hierarchical Nuclear Norm Penalization for Multi-View Data Integration. Biometrics 79, 2933–2946.
- A useful variant of the davis–kahan theorem for statisticians. Biometrika 102, 315–323.
- Exponential canonical correlation analysis with orthogonal variation. arXiv preprint arXiv:2208.00048 .
- Joint association and classification analysis of multi-view data. Biometrics 78, 1614–1625.
- Group component analysis for multiblock data: Common and individual feature extraction. IEEE transactions on neural networks and learning systems 27, 2426–2439.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.