Censored Graphical Horseshoe (CGHS)
- Censored Graphical Horseshoe is a Bayesian method that extends the Graphical Horseshoe to estimate sparse precision matrices in Gaussian graphical models with censored and missing observations.
- It employs a latent-variable strategy combined with global-local Horseshoe shrinkage to enable efficient posterior inference in complex, high-dimensional data.
- Empirical studies show that CGHS achieves lower estimation errors, higher true positive rates, and near-zero false discovery rates across varied censoring regimes.
The Censored Graphical Horseshoe (CGHS) is a Bayesian framework for sparse precision matrix estimation in Gaussian graphical models, designed to accommodate data subject to censoring and arbitrary missingness. CGHS generalizes the Graphical Horseshoe (GHS) method, extending its sparse Bayesian regression capabilities to cases where some variables are only partially observed due to detection limits or absences in the measurement process. By introducing a latent variable augmentation scheme and leveraging the adaptive global-local shrinkage properties of the Horseshoe prior, CGHS enables efficient posterior inference even under incomplete data modalities prevalent in biomedical, environmental, and other data-rich scientific domains (Mai et al., 10 Jan 2026).
1. Problem Formulation
CGHS addresses inference for precision matrices in mean-zero Gaussian graphical models, given i.i.d. samples . In many domains, such as qPCR, environmental assays, and single-cell studies, not all are fully observed: measurements may be censored at left thresholds (e.g., detection limits), or may be entirely missing. Formally, each recorded datum is defined by
with additional missingness. Observed, censored, and missing indices for sample are given by sets , , and . The observed-data likelihood for left censoring,
comprehensively models the incomplete observation structure. This likelihood recovers the fully-observed model when censoring and missingness are absent.
2. Latent-Variable Representation
To facilitate posterior inference under censoring and missingness, CGHS employs a latent-variable strategy by introducing with . Observed data arise via deterministic or truncated mappings from : The joint density factors as a product of multivariate normals and indicator functions truncating censored and missing entries. Posterior computation alternates between imputation (sampling censored/missing ) and precision matrix updates, enabling inference even when entire data blocks are unobserved.
3. Horseshoe Prior and Model Specification
For inducing sparsity, CGHS places a global-local Horseshoe prior on off-diagonal elements of : Diagonal elements are assigned weakly informative priors or estimated via nodewise residual variances, subject to . This prior structure yields adaptive shrinkage, favoring near-zero estimates for non-edges while preserving large signals, and is robust to the inclusion of censored or missing measurements.
4. Posterior Computation via Gibbs Sampling
CGHS exploits nodewise regression for efficient Gibbs sampling. The reparameterization: yields and . Sampling proceeds iteratively:
- Latent Imputation: If , set to observed value; if , sample from truncated normal; if missing, sample from full normal conditional.
- Regression Coefficients : Sample as , with and computed from the design matrix and Horseshoe scales.
- Residual Variance : Inverse-Gamma update according to nodewise residuals.
- Local/Global Scales : Sample via latent-inverse-Gamma parameterization, enabling efficient mixing.
- Precision Matrix Reconstruction: Aggregate nodewise estimates and average and ; ensure positive-definiteness.
This block Gibbs sampler efficiently explores the posterior, with update complexity scaling as per iteration.
5. Theoretical Properties
Under standard high-dimensional conditions—bounded eigenvalues, sparsity of , and a mild curvature assumption—the tempered posterior
concentrates around the true precision matrix at rate , identical to rates obtained in the uncensored graphical model regime. Precisely, for any ,
where is the Rényi divergence. This result extends to arbitrary missingness as well. Proofs leverage Taylor expansions of the likelihood, control over Kullback–Leibler neighborhoods under Horseshoe priors, and general concentration results for tempered posteriors.
6. Empirical Studies and Methodological Comparisons
CGHS has been benchmarked against the penalized censored graphical lasso (cglasso, Augugliaro et al.) in two canonical precision matrix regimes:
- Tridiagonal (chain) structure: , other off-diagonals zero.
- Block structure: fully connected block, zeros elsewhere.
Studies varied , sample size , and censoring/missing rates . Metrics included squared Frobenius error , true positive rate (TPR), and false discovery rate (FDR). CGHS consistently achieved lower estimation error and higher TPR with near-zero FDR, especially as dimensionality or missingness increased. In the chain graph regime, cglasso frequently failed to recover the target sparsity pattern.
7. Implementation Details
- Sampling Algorithms: Truncated-normal values for censored are sampled via inverse-CDF methods, with exponential-tilting for extreme tail accuracy.
- Linear Algebra Optimizations: Nodewise regressions employ Cholesky decompositions for computational efficiency.
- Priors: Residual variance hyperparameters ensure weak informativeness.
- Convergence Diagnostics: Empirical traces, autocorrelation, and effective sample size diagnostics show rapid mixing with burn-in periods of approximately 1,000 iterations out of 5,000.
- Computational Complexity: The dominant cost is per iteration due to regression steps.
- Software: The R package GHScenmis (https://github.com/tienmt/ghscenmis) supports both censored (cenGHS_censored) and missing (cenGHS_missing) data, providing practical tools for application.
CGHS thus offers a principled, efficient, and theoretically robust approach for Bayesian graphical model estimation under censoring and missingness, extending shrinkage benefits of the Horseshoe prior to settings where frequentist Lasso-based methods are inadequate (Mai et al., 10 Jan 2026).