Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Regularised Canonical Correlation Analysis: graphical lasso, biplots and beyond (2403.02979v1)

Published 5 Mar 2024 in stat.ME, math.ST, stat.AP, and stat.TH

Abstract: Recent developments in regularized Canonical Correlation Analysis (CCA) promise powerful methods for high-dimensional, multiview data analysis. However, justifying the structural assumptions behind many popular approaches remains a challenge, and features of realistic biological datasets pose practical difficulties that are seldom discussed. We propose a novel CCA estimator rooted in an assumption of conditional independencies and based on the Graphical Lasso. Our method has desirable theoretical guarantees and good empirical performance, demonstrated through extensive simulations and real-world biological datasets. Recognizing the difficulties of model selection in high dimensions and other practical challenges of applying CCA in real-world settings, we introduce a novel framework for evaluating and interpreting regularized CCA models in the context of Exploratory Data Analysis (EDA), which we hope will empower researchers and pave the way for wider adoption.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. T. W. Anderson. An introduction to multivariate statistical analysis. Wiley series in probability and statistics. Wiley-Interscience, Hoboken, N.J, 3rd ed edition, 2003. ISBN 978-0-471-36091-9.
  2. Deep canonical correlation analysis. In International conference on machine learning, pages 1247–1255. PMLR, 2013.
  3. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3(Jul):1–48, 2002. ISSN ISSN 1533-7928. URL https://www.jmlr.org/papers/v3/bach02a.html.
  4. A Probabilistic Interpretation of Canonical Correlation Analysis. Technical report, University of California, Berkeley, 2005.
  5. R. Bhatia. Matrix Analysis, volume 169 of Graduate Texts in Mathematics. Springer, New York, NY, 1997. ISBN 978-1-4612-6857-4 978-1-4612-0653-8. doi: 10.1007/978-1-4612-0653-8. URL http://link.springer.com/10.1007/978-1-4612-0653-8.
  6. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, Aug. 2006. ISBN 978-0-387-31073-2. Google-Books-ID: qWPwnQEACAAJ.
  7. M. Borga. Learning Multidimensional Signal Processing. PhD thesis, Linköping University, 1998. URL http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54341. Publisher: Linköping University Electronic Press.
  8. S. Boyd. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2010. ISSN 1935-8237, 1935-8245. doi: 10.1561/2200000016. URL http://www.nowpublishers.com/article/Details/MAL-016.
  9. S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, UK ; New York, 2004. ISBN 978-0-521-83378-3.
  10. A constrained ℓℓ\ellroman_ℓ-1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594–607, 2011.
  11. T. T. Cai and A. Zhang. Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics, June 2020. URL http://arxiv.org/abs/1605.00353. arXiv:1605.00353 [math, stat].
  12. M. Carlsson. von Neumann’s trace inequality for Hilbert–Schmidt operators. Expositiones Mathematicae, 39(1):149–157, Mar. 2021. ISSN 0723-0869. doi: 10.1016/j.exmath.2020.05.001. URL https://www.sciencedirect.com/science/article/pii/S0723086920300220.
  13. N. S. Chandel. Amino Acid Metabolism. Cold Spring Harbor Perspectives in Biology, 13(4):a040584, Jan. 2021. ISSN , 1943-0264. doi: 10.1101/cshperspect.a040584. URL http://cshperspectives.cshlp.org/content/13/4/a040584. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.
  14. Latent variable graphical model selection via convex optimization. The Annals of Statistics, 40(4):1935–1967, Aug. 2012. ISSN 0090-5364, 2168-8966. doi: 10.1214/11-AOS949. URL https://projecteuclid.org/journals/annals-of-statistics/volume-40/issue-4/Latent-variable-graphical-model-selection-via-convex-optimization/10.1214/11-AOS949.full. Publisher: Institute of Mathematical Statistics.
  15. J. Chapman and H.-T. Wang. CCA-Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic CCA methods in a scikit-learn style framework. Journal of Open Source Software, 6(68):3823, Dec. 2021. ISSN 2475-9066. doi: 10.21105/joss.03823. URL https://joss.theoj.org/papers/10.21105/joss.03823.
  16. Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients, Nov. 2023. URL http://arxiv.org/abs/2310.01012. arXiv:2310.01012 [cs, stat].
  17. Sparse CCA via Precision Adjusted Iterative Thresholding. Technical Report arXiv:1311.6186, arXiv, Nov. 2013. URL http://arxiv.org/abs/1311.6186. arXiv:1311.6186 [math, stat] type: article.
  18. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell, 10(6):529–541, Dec. 2006. ISSN 1535-6108. doi: 10.1016/j.ccr.2006.10.009. URL https://www.sciencedirect.com/science/article/pii/S1535610806003151.
  19. Elements of Information Theory. John Wiley & Sons, Nov. 2012. ISBN 978-1-118-58577-1. Google-Books-ID: VWq5GG6ycxMC.
  20. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, July 2008. ISSN 1465-4644. doi: 10.1093/biostatistics/kxm045. URL https://doi.org/10.1093/biostatistics/kxm045.
  21. Scalable and Flexible Multiview MAX-VAR Canonical Correlation Analysis. IEEE Transactions on Signal Processing, 65(16):4150–4165, Aug. 2017. ISSN 1053-587X, 1941-0476. doi: 10.1109/TSP.2017.2698365. URL http://arxiv.org/abs/1605.09459. arXiv:1605.09459 [stat].
  22. K. R. Gabriel. The Biplot Graphic Display of Matrices with Application to Principal Component Analysis. Biometrika, 58(3):453–467, 1971. ISSN 0006-3444. doi: 10.2307/2334381. URL https://www.jstor.org/stable/2334381. Publisher: [Oxford University Press, Biometrika Trust].
  23. Minimax estimation in sparse canonical correlation analysis. The Annals of Statistics, 43(5):2168–2197, Oct. 2015. ISSN 0090-5364, 2168-8966. doi: 10.1214/15-AOS1332. URL https://projecteuclid.org/journals/annals-of-statistics/volume-43/issue-5/Minimax-estimation-in-sparse-canonical-correlation-analysis/10.1214/15-AOS1332.full. Publisher: Institute of Mathematical Statistics.
  24. Sparse CCA: Adaptive Estimation and Computational Barriers. arXiv:1409.8565 [math, stat], Apr. 2016. URL http://arxiv.org/abs/1409.8565. arXiv: 1409.8565.
  25. Expanding Role of Gut Microbiota in Lipid Metabolism. Current opinion in lipidology, 27(2):141–147, Apr. 2016. ISSN 0957-9672. doi: 10.1097/MOL.0000000000000278. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5125441/.
  26. I. González and S. Déjean. CCA: Canonical Correlation Analysis, Mar. 2021. URL https://CRAN.R-project.org/package=CCA.
  27. R. L. Gorsuch. Factor analysis: Classic edition. Routledge, 2014.
  28. F. Gu and H. Wu. Simultaneous canonical correlation analysis with invariant canonical loadings. Behaviormetrika, 45(1):111–132, Apr. 2018. ISSN 1349-6964. doi: 10.1007/s41237-017-0042-8. URL https://doi.org/10.1007/s41237-017-0042-8.
  29. D. R. Hardoon and J. Shawe-Taylor. Sparse canonical correlation analysis. Machine Learning, 83(3):331–353, June 2011. ISSN 0885-6125, 1573-0565. doi: 10.1007/s10994-010-5222-7. URL http://link.springer.com/10.1007/s10994-010-5222-7.
  30. Statistical Learning with Sparsity: the Lasso and Generalizations. CRC Monographs on Statistics and Applied Probability. Chapman & Hall, 2015. URL https://hastie.su.domains/StatLearnSparsity/.
  31. Structure and mechanism of ABC transporter proteins. Current Opinion in Structural Biology, 17(4):412–418, Aug. 2007. ISSN 0959-440X. doi: 10.1016/j.sbi.2007.07.003. URL https://www.sciencedirect.com/science/article/pii/S0959440X07001029.
  32. H. Hotelling. Relations Between Two Sets of Variates. Biometrika, 28(3/4):321–377, 1936. ISSN 0006-3444. doi: 10.2307/2333955. URL https://www.jstor.org/stable/2333955. Publisher: [Oxford University Press, Biometrika Trust].
  33. I. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics. Springer, New York, NY, 1986. ISBN 978-1-4757-1906-2 978-1-4757-1904-8. doi: 10.1007/978-1-4757-1904-8. URL http://link.springer.com/10.1007/978-1-4757-1904-8.
  34. I. T. Jolliffe and J. Cadima. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202, Apr. 2016. doi: 10.1098/rsta.2015.0202. URL https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202. Publisher: Royal Society.
  35. Bayesian Canonical Correlation Analysis. Journal of Machine Learning Research, 14(30):965–1003, 2013. ISSN 1533-7928. URL http://jmlr.org/papers/v14/klami13a.html.
  36. The microbiome in inflammatory bowel disease: current status and the future ahead. Gastroenterology, 146(6):1489–1499, May 2014. ISSN 1528-0012. doi: 10.1053/j.gastro.2014.02.009.
  37. N. Laha and R. Mukherjee. On Support Recovery with Sparse CCA: Information Theoretic and Computational Limits. Technical Report arXiv:2108.06463, arXiv, Aug. 2021. URL http://arxiv.org/abs/2108.06463. arXiv:2108.06463 [math, stat] type: article.
  38. On Statistical Inference with High Dimensional Sparse CCA. Technical Report arXiv:2109.11997, arXiv, Feb. 2022. URL http://arxiv.org/abs/2109.11997. arXiv:2109.11997 [math, stat] type: article.
  39. S. L. Lauritzen. Graphical Models. Oxford Statistical Science Series. Oxford University Press, Oxford, New York, May 1996. ISBN 978-0-19-852219-5.
  40. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature, 569(7758):655–662, May 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1237-9. URL https://www.nature.com/articles/s41586-019-1237-9. Number: 7758 Publisher: Nature Publishing Group.
  41. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. The Annals of Statistics, 41(6):3022–3049, 2013. ISSN 0090-5364. URL https://www.jstor.org/stable/23566757. Publisher: Institute of Mathematical Statistics.
  42. Z. Ma and X. Li. Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates. Bernoulli, 26(1), Feb. 2020. ISSN 1350-7265. doi: 10.3150/19-BEJ1131. URL https://projecteuclid.org/journals/bernoulli/volume-26/issue-1/Subspace-perspective-on-canonical-correlation-analysis--Dimension-reduction-and/10.3150/19-BEJ1131.full.
  43. Q. Mai and X. Zhang. An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics, 75(3):734–744, 2019. ISSN 1541-0420. doi: 10.1111/biom.13043. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13043. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13043.
  44. Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology (Baltimore, Md.), 45(3):767–777, Mar. 2007. ISSN 0270-9139. doi: 10.1002/hep.21510.
  45. Canonical Correlation Analysis and Partial Least Squares for identifying brain-behaviour associations: a tutorial and a comparative study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, Aug. 2022. ISSN 2451-9022. doi: 10.1016/j.bpsc.2022.07.012. URL https://www.sciencedirect.com/science/article/pii/S2451902222001859.
  46. M. F. Neurath. Targeting immune cell circuits and trafficking in inflammatory bowel disease. Nature Immunology, 20(8):970–979, Aug. 2019. ISSN 1529-2916. doi: 10.1038/s41590-019-0415-0.
  47. N. Parikh and S. Boyd. Proximal Algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014. ISSN 2167-3888. doi: 10.1561/2400000003. URL http://dx.doi.org/10.1561/2400000003.
  48. A. Prokhorov. Partial correlation coefficient - Encyclopedia of Mathematics. URL https://encyclopediaofmath.org/wiki/Partial_correlation_coefficient#:~:text=A%20partial%20correlation%20coefficient%20is,the%20remaining%20variables%20is%20eliminated.
  49. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5(none):935–980, Jan. 2011. ISSN 1935-7524, 1935-7524. doi: 10.1214/11-EJS631. URL https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-5/issue-none/High-dimensional-covariance-estimation-by-minimizing-%e2%84%931-penalized-log-determinant/10.1214/11-EJS631.full. Publisher: Institute of Mathematical Statistics and Bernoulli Society.
  50. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study. Bioinformatics, 36(17):4616–4625, Nov. 2020. ISSN 1367-4803. doi: 10.1093/bioinformatics/btaa530. URL https://doi.org/10.1093/bioinformatics/btaa530.
  51. R. Rosipal and N. Krämer. Overview and Recent Advances in Partial Least Squares. In C. Saunders, M. Grobelnik, S. Gunn, and J. Shawe-Taylor, editors, Subspace, Latent Structure and Feature Selection, Lecture Notes in Computer Science, pages 34–51, Berlin, Heidelberg, 2006. Springer. ISBN 978-3-540-34138-3. doi: 10.1007/11752790˙2.
  52. S. Roweis and Z. Ghahramani. A Unifying Review of Linear Gaussian Models. Neural Computation, 11(2):305–345, Feb. 1999. ISSN 0899-7667, 1530-888X. doi: 10.1162/089976699300016674. URL https://direct.mit.edu/neco/article/11/2/305-345/6249.
  53. Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data. Biometrics, 74(4):1362–1371, 2018a. ISSN 1541-0420. doi: 10.1111/biom.12886. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12886. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.12886.
  54. Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics, 74(1):300–312, 2018b. ISSN 1541-0420. doi: 10.1111/biom.12715. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12715. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.12715.
  55. GGLasso – a Python package for General Graphical Lasso computation, Oct. 2021. URL http://arxiv.org/abs/2110.10521. Number: arXiv:2110.10521 arXiv:2110.10521 [stat].
  56. Sparse Canonical Correlation Analysis via Concave Minimization. Technical Report arXiv:1909.07947, arXiv, Sept. 2019. URL http://arxiv.org/abs/1909.07947. arXiv:1909.07947 [cs, stat] type: article.
  57. Matrix Perturbation Theory. ACADEMIC PressINC, July 1990. ISBN 978-1-4933-0199-7. Google-Books-ID: bIYEogEACAAJ.
  58. Sparse canonical correlation analysis. arXiv:1705.10865 [stat], June 2017. URL http://arxiv.org/abs/1705.10865. arXiv: 1705.10865.
  59. Sparse Generalized Eigenvalue Problem: Optimal Statistical Rates via Truncated Rayleigh Flow, Aug. 2018. URL http://arxiv.org/abs/1604.08697. arXiv:1604.08697 [stat].
  60. A. Tenenhaus and M. Tenenhaus. Regularized Generalized Canonical Correlation Analysis. Psychometrika, 76(2):257–284, Apr. 2011. ISSN 0033-3123, 1860-0980. doi: 10.1007/s11336-011-9206-8. URL http://link.springer.com/10.1007/s11336-011-9206-8.
  61. Variable selection for generalized canonical correlation analysis. Biostatistics, 15(3):569–583, July 2014. ISSN 1465-4644. doi: 10.1093/biostatistics/kxu001. URL https://doi.org/10.1093/biostatistics/kxu001.
  62. C. J. F. ter Braak. Interpreting canonical correlation analysis through biplots of structure correlations and weights. Psychometrika, 55(3):519–531, Sept. 1990. ISSN 0033-3123, 1860-0980. doi: 10.1007/BF02294765. URL http://link.springer.com/10.1007/BF02294765.
  63. D. W. a. R. Tibshirani. PMA: Penalized Multivariate Analysis, Feb. 2020. URL https://CRAN.R-project.org/package=PMA.
  64. A Tutorial on Canonical Correlation Methods. ACM Computing Surveys, 50(6):95:1–95:33, Nov. 2017. ISSN 0360-0300. doi: 10.1145/3136624. URL https://doi.org/10.1145/3136624.
  65. H. D. Vinod. Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2):147–166, May 1976. ISSN 0304-4076. doi: 10.1016/0304-4076(76)90010-5. URL https://www.sciencedirect.com/science/article/pii/0304407676900105.
  66. M. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019. ISBN 978-1-108-49802-9. doi: 10.1017/9781108627771. URL https://www.cambridge.org/core/books/highdimensional-statistics/8A91ECEEC38F46DAB53E9FF8757C7A4E.
  67. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534, July 2009. ISSN 1465-4644, 1468-4357. doi: 10.1093/biostatistics/kxp008. URL https://academic.oup.com/biostatistics/article-lookup/doi/10.1093/biostatistics/kxp008.
  68. tSSNALM: A fast two-stage semi-smooth Newton augmented Lagrangian method for sparse CCA. Applied Mathematics and Computation, 383:125272, Oct. 2020. ISSN 0096-3003. doi: 10.1016/j.amc.2020.125272. URL https://www.sciencedirect.com/science/article/pii/S0096300320302411.
  69. A useful variant of the Davis—Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015. ISSN 0006-3444. URL https://www.jstor.org/stable/43908537. Publisher: [Oxford University Press, Biometrika Trust].
  70. M. Yuan and Y. Lin. Model Selection and Estimation in the Gaussian Graphical Model. Biometrika, 94(1):19–35, 2007. ISSN 0006-3444. URL https://www.jstor.org/stable/20441351. Publisher: [Oxford University Press, Biometrika Trust].
  71. Barlow Twins: Self-Supervised Learning via Redundancy Reduction, June 2021. URL http://arxiv.org/abs/2103.03230. arXiv:2103.03230 [cs, q-bio].
  72. Cellular and molecular immunopathogenesis of ulcerative colitis. Cellular & Molecular Immunology, 3(1):35–40, Feb. 2006. ISSN 1672-7681.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com