High-dimensional regression with outcomes of mixed-type using the multivariate spike-and-slab LASSO (2506.13007v1)
Abstract: We consider a high-dimensional multi-outcome regression in which $q,$ possibly dependent, binary and continuous outcomes are regressed onto $p$ covariates. We model the observed outcome vector as a partially observed latent realization from a multivariate linear regression model. Our goal is to estimate simultaneously a sparse matrix ($B$) of latent regression coefficients (i.e., partial covariate effects) and a sparse latent residual precision matrix ($\Omega$), which induces partial correlations between the observed outcomes. To this end, we specify continuous spike-and-slab priors on all entries of $B$ and off-diagonal elements of $\Omega$ and introduce a Monte Carlo Expectation-Conditional Maximization algorithm to compute the maximum a posterior estimate of the model parameters. Under a set of mild assumptions, we derive the posterior contraction rate for our model in the high-dimensional regimes where both $p$ and $q$ diverge with the sample size $n$ and establish a sure screening property, which implies that, as $n$ increases, we can recover all truly non-zero elements of $B$ with probability tending to one. We demonstrate the excellent finite-sample properties of our proposed method, which we call mixed-mSSL, using extensive simulation studies and three applications spanning medicine to ecology.