TIGRESS: Trustful Inference of Gene REgulation using Stability Selection (1205.1181v1)

Published 6 May 2012 in stat.ML and q-bio.QM

Abstract: Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/~ahaury. Running TIGRESS online is possible on GenePattern: http://www.broadinstitute.org/cancer/software/genepattern/.

Citations (393)

View on Semantic Scholar

Summary

The paper introduces TIGRESS, a method that recasts gene regulatory network inference as a sparse regression problem using LARS and stability selection.
The approach achieves high precision on benchmark datasets, notably in DREAM5 challenges, by robustly evaluating transcription factor influence.
TIGRESS’s innovative stability scoring mitigates feature selection variability, highlighting potential for future non-linear modeling improvements.

TIGRESS: Trustful Inference of Gene Regulation using Stability Selection

The paper "TIGRESS: Trustful Inference of Gene Regulation using Stability Selection" details a method for inferring gene regulatory networks (GRNs) from gene expression data. GRN inference is an essential aspect of understanding biological processes and has applications in drug discovery and disease modeling. Despite the considerable advancements, the problem remains complex, largely due to the vast possible interactions and the indirect relationships present in gene expression data.

The core contribution of this work is TIGRESS, a method that formulates GRN inference as a sparse regression problem. TIGRESS employs Least Angle Regression (LARS) paired with stability selection, introducing a new scoring methodology to enhance performance. The method was competitively ranked among the top at the DREAM5 challenge, a testament to its efficacy.

Methodological Insights

TIGRESS approaches GRN inference by scrutinizing the gene expression data to isolate transcription factors (TF) that potentially regulate target genes (TG). This is achieved through a robust feature selection process. The problem is framed as predicting which TFs influence the expression profiles of TPs, effectively reducing GRN inference to a series of regression problems. The use of LARS, a computation-efficient regression technique, allows the method to identify relevant TFs while disregarding those explaining indirect relationships.

One of the paper's notable innovations is in stability selection. By iteratively perturbing the data and applying LARS, TIGRESS gains resilience against the variance in feature selection outcomes caused by correlated genes. The newly proposed scoring metric evaluates the stability of selected features, thus improving the robustness and accuracy of the inferred network.

Performance and Comparative Analysis

TIGRESS was assessed on several benchmark datasets, notably including the DREAM5 networks and empirical datasets for E. coli and S. cerevisiae. The results highlighted TIGRESS's competitive performance, especially in silico networks, where it secured high precision in inferring regulatory interactions. The choice of using the area under stability curves as a scoring mechanism proved advantageous, demonstrating reduced sensitivity to parameter choices and increased performance reliability.

When compared to other GRN inference methods, such as GENIE3 and ARACNE, TIGRESS showed its strength in correctly ranking direct interactions higher, although the paper notes its comparative underperformance on in vivo datasets. This is partially attributed to the linear assumptions inherent in LARS, suggesting future work might explore non-linear models to capture the complexity of live organisms more effectively.

Implications and Future Directions

The inquiry into TIGRESS's inaccuracies, such as the frequent misidentification of sibling regulatory genes as direct relationships, sheds light on potential areas for refinement. This nuance might be addressed by incorporating additional biological constraints or knowledge, possibly leading to more sophisticated feature selection paradigms.

The TIGRESS framework, by leveraging advanced regression techniques coupled with randomization-based scoring, stands as a significant contribution to GRN inference methodologies. Its design principles can inspire future developments in the domain, particularly concerning the integration of non-linear modeling approaches and the extension of scoring methodologies to accommodate more complex regulatory scenarios. As the field advances, further empirical validation on a broader set of in vivo networks will elucidate the full scope of TIGRESS's applicability and efficacy.

PDF Markdown