High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model (2507.09889v1)
Abstract: Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by four large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational EM algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency. The R package for the proposed method is publicly accessible at https://CRAN.R-project.org/package=MMGFM.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.