Papers
Topics
Authors
Recent
Search
2000 character limit reached

Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies

Published 10 Jul 2024 in stat.ME | (2407.07809v1)

Abstract: This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g., proteins and gene pathways) when only lower-level measurements are directly observed (e.g., peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of p-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.