Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 100 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 103 tok/s
GPT OSS 120B 480 tok/s Pro
Kimi K2 215 tok/s Pro
2000 character limit reached

A spectral method for multi-view subspace learning using the product of projections (2410.19125v1)

Published 24 Oct 2024 in stat.ML, cs.LG, math.ST, stat.CO, stat.ME, and stat.TH

Abstract: Multi-view data provides complementary information on the same set of observations, with multi-omics and multimodal sensor data being common examples. Analyzing such data typically requires distinguishing between shared (joint) and unique (individual) signal subspaces from noisy, high-dimensional measurements. Despite many proposed methods, the conditions for reliably identifying joint and individual subspaces remain unclear. We rigorously quantify these conditions, which depend on the ratio of the signal rank to the ambient dimension, principal angles between true subspaces, and noise levels. Our approach characterizes how spectrum perturbations of the product of projection matrices, derived from each view's estimated subspaces, affect subspace separation. Using these insights, we provide an easy-to-use and scalable estimation algorithm. In particular, we employ rotational bootstrap and random matrix theory to partition the observed spectrum into joint, individual, and noise subspaces. Diagnostic plots visualize this partitioning, providing practical and interpretable insights into the estimation performance. In simulations, our method estimates joint and individual subspaces more accurately than existing approaches. Applications to multi-omics data from colorectal cancer patients and nutrigenomic study of mice demonstrate improved performance in downstream predictive tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Numerical methods for computing angles between linear subspaces. In Milestones in Matrix Computation.
  2. Large dimension forecasting models and random singular value spectra. The European Physical Journal B 55, 201–207.
  3. Angle-based joint and individual variation explained. Journal of multivariate analysis 166, 241–265.
  4. The optimal hard threshold for singular values is 4/sqrt(3). IEEE Transactions on Information Theory 60, 5040–5053.
  5. Structural learning and integrative decomposition of multi-view data. Biometrics 75, 1121–1132.
  6. Comments on: Data integration via analysis of subspaces (divas). TEST , 1–8.
  7. The consensus molecular subtypes of colorectal cancer. Nature medicine 21, 1350–1356.
  8. Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28(3-4), 321–377.
  9. Joint and individual variation explained (jive) for integrated analysis of multiple data types. The annals of applied statistics 7, 523.
  10. Novel aspects of pparα𝛼\alphaitalic_α-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology 45, 767–777.
  11. Interpretive jive: Connections with cca and an application to brain connectivity. Frontiers in Neuroscience 16, 969510.
  12. sjive: Supervised joint and individual variation explained. Computational statistics & data analysis 175, 107547.
  13. Integrative factorization of bidimensionally linked matrices. Biometrics 76, 61–74.
  14. Sparse common and distinctive covariates regression. Journal of Chemometrics 35, e3270.
  15. Data integration via analysis of subspaces (divas). TEST , 1–42.
  16. mixomics: An r package for ‘omics feature selection and multiple data integration. PLoS computational biology 13, e1005752.
  17. D-cca: A decomposition-based canonical correlation analysis for high-dimensional datasets. Journal of the American Statistical Association .
  18. False discovery and its control in low rank estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology 82, 997–1027.
  19. A structured overview of simultaneous component based data integration. BMC bioinformatics 10, 1–15.
  20. Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 .
  21. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8.
  22. Hierarchical Nuclear Norm Penalization for Multi-View Data Integration. Biometrics 79, 2933–2946.
  23. A useful variant of the davis–kahan theorem for statisticians. Biometrika 102, 315–323.
  24. Exponential canonical correlation analysis with orthogonal variation. arXiv preprint arXiv:2208.00048 .
  25. Joint association and classification analysis of multi-view data. Biometrics 78, 1614–1625.
  26. Group component analysis for multiblock data: Common and individual feature extraction. IEEE transactions on neural networks and learning systems 27, 2426–2439.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com