Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data (2211.11700v2)

Published 21 Nov 2022 in stat.ML, cs.LG, and stat.ME

Abstract: Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well developed in the case where all variables are either continuous or discrete, including in high-dimensions. However, in many applications data span variables of different types (e.g. continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple yet useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation we propose flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically, via extensive simulations as well an illustrative application to data from the UK Biobank concerning COVID-19 risk factors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. {bmisc}[author] \bauthor\bsnmAnne, \bfnmGégout-Petit\binitsG.-P., \bauthor\bsnmAurélie, \bfnmGueudin-Muller\binitsG.-M. and \bauthor\bsnmClémence, \bfnmKarmann\binitsK. (\byear2019). \btitleGraph estimation for Gaussian data zero-inflated by double truncation. \bnotearXiv:1911.07694. \endbibitem
  2. {barticle}[author] \bauthor\bsnmBanerjee, \bfnmOnureena\binitsO., \bauthor\bsnmEl Ghaoui, \bfnmLaurent\binitsL. and \bauthor\bsnmd’Aspremont, \bfnmAlexandre\binitsA. (\byear2008). \btitleModel selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. \bjournalJ. Mach. Learn. Res. \bvolume9 \bpages485–516. \bmrnumber2417243 \endbibitem
  3. {barticle}[author] \bauthor\bsnmBedrick, \bfnmEdward J.\binitsE. J. (\byear1992). \btitleA comparison of generalized and modified sample biserial correlation estimators. \bjournalPsychometrika \bvolume57 \bpages183–201. \bdoi10.1007/BF02294504 \bmrnumber1173589 \endbibitem
  4. {barticle}[author] \bauthor\bsnmBedrick, \bfnmEdward J.\binitsE. J. and \bauthor\bsnmBreslin, \bfnmFrederick C.\binitsF. C. (\byear1996). \btitleEstimating the polyserial correlation coefficient. \bjournalPsychometrika \bvolume61 \bpages427–443. \bdoi10.1007/BF02294548 \bmrnumber1424910 \endbibitem
  5. {barticle}[author] \bauthor\bsnmBerlin, \bfnmDavid A.\binitsD. A., \bauthor\bsnmGulick, \bfnmRoy M.\binitsR. M. and \bauthor\bsnmMartinez, \bfnmFernando J.\binitsF. J. (\byear2020). \btitleSevere Covid-19. \bjournalNew England Journal of Medicine \bvolume383 \bpages2451–2460. \bdoi10.1056/nejmcp2009575 \endbibitem
  6. {barticle}[author] \bauthor\bsnmBroyden, \bfnmC. G.\binitsC. G. (\byear1965). \btitleA class of methods for solving nonlinear simultaneous equations. \bjournalMath. Comp. \bvolume19 \bpages577–593. \bdoi10.2307/2003941 \bmrnumber198670 \endbibitem
  7. {barticle}[author] \bauthor\bsnmCai, \bfnmTony\binitsT., \bauthor\bsnmLiu, \bfnmWeidong\binitsW. and \bauthor\bsnmLuo, \bfnmXi\binitsX. (\byear2011). \btitleA constrained ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization approach to sparse precision matrix estimation. \bjournalJ. Amer. Statist. Assoc. \bvolume106 \bpages594–607. \bdoi10.1198/jasa.2011.tm10155 \bmrnumber2847973 \endbibitem
  8. {barticle}[author] \bauthor\bsnmChen, \bfnmShizhe\binitsS., \bauthor\bsnmWitten, \bfnmDaniela M.\binitsD. M. and \bauthor\bsnmShojaie, \bfnmAli\binitsA. (\byear2015). \btitleSelection and estimation for mixed graphical models. \bjournalBiometrika \bvolume102 \bpages47–64. \bdoi10.1093/biomet/asu051 \bmrnumber3335095 \endbibitem
  9. {barticle}[author] \bauthor\bsnmCox, \bfnmN. R.\binitsN. R. (\byear1974). \btitleEstimation of the correlation between a continuous and a discrete variable. \bjournalBiometrics \bvolume30 \bpages171–178. \bdoi10.2307/2529626 \bmrnumber334376 \endbibitem
  10. {binproceedings}[author] \bauthor\bsnmFeng, \bfnmHuijie\binitsH. and \bauthor\bsnmNing, \bfnmYang\binitsY. (\byear2019). \btitleHigh-dimensional Mixed Graphical Model with Ordinal Data: Parameter Estimation and Statistical Inference. In \bbooktitleProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (\beditor\bfnmKamalika\binitsK. \bsnmChaudhuri and \beditor\bfnmMasashi\binitsM. \bsnmSugiyama, eds.). \bseriesProceedings of Machine Learning Research \bvolume89 \bpages654–663. \bpublisherPMLR. \endbibitem
  11. {barticle}[author] \bauthor\bsnmFinegold, \bfnmMichael\binitsM. and \bauthor\bsnmDrton, \bfnmMathias\binitsM. (\byear2011). \btitleRobust graphical modeling of gene networks using classical and alternative t𝑡titalic_t-distributions. \bjournalAnn. Appl. Stat. \bvolume5 \bpages1057–1080. \bdoi10.1214/10-AOAS410 \bmrnumber2840186 \endbibitem
  12. {bmanual}[author] \bauthor\bsnmFox, \bfnmJohn\binitsJ. (\byear2022). \btitlepolycor: Polychoric and Polyserial Correlations \bnoteR package version 0.8-1. \endbibitem
  13. {barticle}[author] \bauthor\bsnmFriedman, \bfnmJ.\binitsJ., \bauthor\bsnmHastie, \bfnmT.\binitsT. and \bauthor\bsnmTibshirani, \bfnmR.\binitsR. (\byear2007). \btitleSparse inverse covariance estimation with the graphical lasso. \bjournalBiostatistics \bvolume9 \bpages432–441. \bdoi10.1093/biostatistics/kxm045 \endbibitem
  14. {barticle}[author] \bauthor\bsnmHigham, \bfnmNicholas J.\binitsN. J. (\byear1988). \btitleComputing a nearest symmetric positive semidefinite matrix. \bjournalLinear Algebra Appl. \bvolume103 \bpages103–118. \bdoi10.1016/0024-3795(88)90223-6 \bmrnumber943997 \endbibitem
  15. {barticle}[author] \bauthor\bsnmHoeffding, \bfnmWassily\binitsW. (\byear1963). \btitleProbability inequalities for sums of bounded random variables. \bjournalJ. Amer. Statist. Assoc. \bvolume58 \bpages13–30. \bmrnumber144363 \endbibitem
  16. {barticle}[author] \bauthor\bsnmJin, \bfnmShaobo\binitsS. and \bauthor\bsnmYang-Wallentin, \bfnmFan\binitsF. (\byear2017). \btitleAsymptotic robustness study of the polychoric correlation estimation. \bjournalPsychometrika \bvolume82 \bpages67–85. \bdoi10.1007/s11336-016-9512-2 \bmrnumber3614808 \endbibitem
  17. {barticle}[author] \bauthor\bsnmLam, \bfnmClifford\binitsC. and \bauthor\bsnmFan, \bfnmJianqing\binitsJ. (\byear2009). \btitleSparsistency and rates of convergence in large covariance matrix estimation. \bjournalAnn. Statist. \bvolume37 \bpages4254–4278. \bdoi10.1214/09-AOS720 \bmrnumber2572459 \endbibitem
  18. {barticle}[author] \bauthor\bsnmLee, \bfnmJason D.\binitsJ. D. and \bauthor\bsnmHastie, \bfnmTrevor J.\binitsT. J. (\byear2015). \btitleLearning the structure of mixed graphical models. \bjournalJ. Comput. Graph. Statist. \bvolume24 \bpages230–253. \bdoi10.1080/10618600.2014.900500 \bmrnumber3328255 \endbibitem
  19. {barticle}[author] \bauthor\bsnmLiu, \bfnmHan\binitsH., \bauthor\bsnmLafferty, \bfnmJohn\binitsJ. and \bauthor\bsnmWasserman, \bfnmLarry\binitsL. (\byear2009). \btitleThe nonparanormal: semiparametric estimation of high dimensional undirected graphs. \bjournalJ. Mach. Learn. Res. \bvolume10 \bpages2295–2328. \bmrnumber2563983 \endbibitem
  20. {barticle}[author] \bauthor\bsnmMei, \bfnmSong\binitsS., \bauthor\bsnmBai, \bfnmYu\binitsY. and \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. (\byear2018). \btitleThe landscape of empirical risk for nonconvex losses. \bjournalAnn. Statist. \bvolume46 \bpages2747–2774. \bdoi10.1214/17-AOS1637 \bmrnumber3851754 \endbibitem
  21. {barticle}[author] \bauthor\bsnmMeinshausen, \bfnmNicolai\binitsN. and \bauthor\bsnmBühlmann, \bfnmPeter\binitsP. (\byear2006). \btitleHigh-dimensional graphs and variable selection with the lasso. \bjournalAnn. Statist. \bvolume34 \bpages1436–1462. \bdoi10.1214/009053606000000281 \bmrnumber2278363 \endbibitem
  22. {barticle}[author] \bauthor\bsnmMiyamura, \bfnmMasashi\binitsM. and \bauthor\bsnmKano, \bfnmYutaka\binitsY. (\byear2006). \btitleRobust Gaussian graphical modeling. \bjournalJ. Multivariate Anal. \bvolume97 \bpages1525–1550. \bdoi10.1016/j.jmva.2006.02.006 \bmrnumber2275418 \endbibitem
  23. {barticle}[author] \bauthor\bsnmOlsson, \bfnmUlf\binitsU. (\byear1979). \btitleMaximum likelihood estimation of the polychoric correlation coefficient. \bjournalPsychometrika \bvolume44 \bpages443–460. \bdoi10.1007/BF02296207 \bmrnumber554892 \endbibitem
  24. {barticle}[author] \bauthor\bsnmOlsson, \bfnmUlf\binitsU., \bauthor\bsnmDrasgow, \bfnmFritz\binitsF. and \bauthor\bsnmDorans, \bfnmNeil J.\binitsN. J. (\byear1982). \btitleThe polyserial correlation coefficient. \bjournalPsychometrika \bvolume47 \bpages337–347. \bdoi10.1007/BF02294164 \bmrnumber678066 \endbibitem
  25. {barticle}[author] \bauthor\bsnmPearson, \bfnmKarl\binitsK. (\byear1900). \btitleI. Mathematical contributions to the theory of evolution.—VII. On the correlation of characters not quantitatively measurable. \bjournalPhilosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character \bvolume195 \bpages1–47. \endbibitem
  26. {barticle}[author] \bauthor\bsnmPearson, \bfnmKarl\binitsK. (\byear1913). \btitleOn the measurement of the influence of "broad categories" on correlation. \bjournalBiometrika \bvolume9 \bpages116–139. \endbibitem
  27. {bmisc}[author] \bauthor\bsnmQuan, \bfnmXiaoyun\binitsX., \bauthor\bsnmBooth, \bfnmJames G.\binitsJ. G. and \bauthor\bsnmWells, \bfnmMartin T.\binitsM. T. (\byear2018). \btitleRank-based approach for estimating correlations in mixed ordinal data. \bnotearXiv: 1809.06255. \endbibitem
  28. {barticle}[author] \bauthor\bsnmRavikumar, \bfnmPradeep\binitsP., \bauthor\bsnmWainwright, \bfnmMartin J.\binitsM. J. and \bauthor\bsnmLafferty, \bfnmJohn D.\binitsJ. D. (\byear2010). \btitleHigh-dimensional Ising model selection using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized logistic regression. \bjournalAnn. Statist. \bvolume38 \bpages1287–1319. \bdoi10.1214/09-AOS691 \bmrnumber2662343 \endbibitem
  29. {barticle}[author] \bauthor\bsnmStädler, \bfnmNicolas\binitsN. and \bauthor\bsnmMukherjee, \bfnmSach\binitsS. (\byear2013). \btitlePenalized estimation in high-dimensional hidden Markov models with state-specific graphical models. \bjournalAnn. Appl. Stat. \bvolume7 \bpages2157–2179. \bdoi10.1214/13-AOAS662 \bmrnumber3161717 \endbibitem
  30. {barticle}[author] \bauthor\bsnmStädler, \bfnmNicolas\binitsN. and \bauthor\bsnmMukherjee, \bfnmSach\binitsS. (\byear2015). \btitleMultivariate gene-set testing based on graphical models. \bjournalBiostatistics \bvolume16 \bpages47–59. \bdoi10.1093/biostatistics/kxu027 \bmrnumber3365410 \endbibitem
  31. {barticle}[author] \bauthor\bsnmTallis, \bfnmG. M.\binitsG. M. (\byear1962). \btitleThe maximum likelihood estimation of correlation from contingency tables. \bjournalBiometrics \bvolume18 \bpages342–353. \bdoi10.2307/2527476 \bmrnumber145613 \endbibitem
  32. {barticle}[author] \bauthor\bsnmVerzelen, \bfnmN.\binitsN. and \bauthor\bsnmVillers, \bfnmF.\binitsF. (\byear2009). \btitleTests for Gaussian graphical models. \bjournalComput. Statist. Data Anal. \bvolume53 \bpages1894–1905. \bdoi10.1016/j.csda.2008.09.022 \bmrnumber2649554 \endbibitem
  33. {barticle}[author] \bauthor\bsnmWainwright, \bfnmM. J.\binitsM. J. and \bauthor\bsnmJordan, \bfnmM. I.\binitsM. I. (\byear2006). \btitleLog-determinant relaxation for approximate inference in discrete Markov random fields. \bjournalIEEE Transactions on Signal Processing \bvolume54 \bpages2099–2109. \bdoi10.1109/tsp.2006.874409 \endbibitem
  34. {barticle}[author] \bauthor\bsnmWei, \bfnmZ.\binitsZ. and \bauthor\bsnmLi, \bfnmH.\binitsH. (\byear2007). \btitleA Markov random field model for network-based analysis of genomic data. \bjournalBioinformatics \bvolume23 \bpages1537–1544. \bdoi10.1093/bioinformatics/btm129 \endbibitem
  35. {barticle}[author] \bauthor\bsnmXue, \bfnmLingzhou\binitsL. and \bauthor\bsnmZou, \bfnmHui\binitsH. (\byear2012). \btitleRegularized rank-based estimation of high-dimensional nonparanormal graphical models. \bjournalAnn. Statist. \bvolume40 \bpages2541–2571. \bdoi10.1214/12-AOS1041 \bmrnumber3097612 \endbibitem
  36. {barticle}[author] \bauthor\bsnmYoon, \bfnmGrace\binitsG., \bauthor\bsnmMüller, \bfnmChristian L.\binitsC. L. and \bauthor\bsnmGaynanova, \bfnmIrina\binitsI. (\byear2021). \btitleFast Computation of Latent Correlations. \bjournalJournal of Computational and Graphical Statistics \bvolume30 \bpages1249-1256. \bdoi10.1080/10618600.2021.1882468 \endbibitem
  37. {barticle}[author] \bauthor\bsnmYuan, \bfnmMing\binitsM. (\byear2010). \btitleHigh dimensional inverse covariance matrix estimation via linear programming. \bjournalJ. Mach. Learn. Res. \bvolume11 \bpages2261–2286. \bmrnumber2719856 \endbibitem
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com