Joint estimation of causal effects from observational and intervention gene expression data (1307.8046v1)
Abstract: Background: Inference of gene regulatory networks from transcriptomic data has been a wide research area in recent years. Proposed methods are mainly based on the use of graphical Gaussian models for observational wild-type data and provide undirected graphs that are not able to accurately highlight the causal relationships among genes. In the present work, we seek to improve estimation of causal effects among genes by jointly modeling observational transcriptomic data with intervention data obtained by performing knock-outs or knock-downs on a subset of genes. By examining the impact of such expression perturbations on other genes, a more accurate reflection of regulatory relationships may be obtained than through the use of wild-type data alone. Results: Using the framework of Gaussian Bayesian networks, we propose a Markov chain Monte Carlo algorithm with a Mallows model and an analytical likelihood maximization to sample from the posterior distribution of causal node orderings, and in turn, to estimate causal effects. The main advantage of the proposed algorithm over previously proposed methods is that it has the flexibility to accommodate any kind of intervention design, including partial or multiple knock-out experiments. Methods were compared on simulated data as well as data from the DREAM 2007 challenge. Conclusions: The simulation study confirmed the impossibility of estimating causal orderings of genes with observation data only. The proposed algorithm was found, in most cases, to perform better than the previously proposed methods in terms of accuracy for the estimation of causal effects. In addition, multiple knock-outs proved to bring valuable additional information compared to single knock-outs. The choice of optimal intervention design therefore appears to be a crucial aspect for causal inference and an interesting challenge for future research.