Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast and Scalable Inference of Multi-Sample Cancer Lineages (1412.8574v1)

Published 30 Dec 2014 in cs.CE and q-bio.GN

Abstract: Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of SSNVs obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open-sourced and available at http://viq854.github.io/lichee.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Victoria Popic (1 paper)
  2. Raheleh Salari (2 papers)
  3. Iman Hajirasouliha (5 papers)
  4. Dorna Kashef-Haghighi (1 paper)
  5. Robert B. West (1 paper)
  6. Serafim Batzoglou (5 papers)
Citations (190)

Summary

Fast and Scalable Inference of Multi-Sample Cancer Lineages

The paper presents LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel computational method developed to automate the reconstruction of cancer cell lineages. This method is primarily designed to analyze tumor samples obtained from patients at various stages or regions, providing insights into cancer progression and heterogeneity. LICHeE utilizes variant allele frequencies (VAFs) derived from deep sequencing of somatic single nucleotide variants (SSNVs) across multiple samples to infer lineage trees and decompose samples into distinct subclones.

Key Methodological Insights

The primary challenge addressed by LICHeE is the complexity of somatic phylogenetics, driven by tumor heterogeneity. Traditional tree-building methods do not adequately account for the stochastic nature of somatic mutations and the presence of distinct subclonal populations within tumors. LICHeE integrates these aspects through a robust pipeline that involves partitioning SNVs into groups based on their presence across samples, clustering these SNVs using Gaussian Mixture Models (GMMs), and constructing an evolutionary constraint network. This network, a directed acyclic graph, encodes potential precedence relationships among SNV clusters.

The method's efficacy stems from its capability to efficiently explore the search space for valid lineage trees using the evolutionary constraint network and the application of VAF constraints to ensure biologically consistent trees. LICHeE operates with high scalability and rapid computation, reconstructing lineage trees in mere seconds from datasets comprising hundreds of SNVs. The method is benchmarked against both simulated data and real-world cancer datasets, demonstrating high sensitivity in SNV calling and accuracy in tree topology reconstruction.

Experimental Validation

LICHeE is evaluated using simulated lineage trees and multiple publicly available cancer datasets. In simulation studies, LICHeE achieves a high sensitivity (94-99%) for SNV group assignment even with lower coverage (100x), illustrating its ability to preserve ancestor-descendant relationships with minimal reversal errors. Furthermore, the robustness to CNV-induced VAF variance reflects its applicability to highly complex cancer genomes.

Application to real-world datasets, such as clear cell renal cell carcinoma (ccRCC) and high-grade serous ovarian cancer (HGSC), reveals LICHeE's superior ability to generate lineage trees that often converge with those formed through extensive manual analysis and demonstrate additional insights into tumor heterogeneity. For instance, LICHeE identifies additional heterogeneous subclones not revealed by traditional maximum parsimony approaches in the ccRCC data, and highlights inadequacies in other methods (e.g. neighbor-joining with Pearson correlation distances) in capturing evidence-backed lineages in HGSC data. Lastly, in breast cancer xenoengraftment studies, the lineage trees derived by LICHeE align well with single-cell sequencing reconstructions, underscoring the method's consistency and reliability.

Implications and Future Directions

LICHeE represents a significant advancement in multi-sample cancer phylogenetic inference, providing a scalable approach for analyzing extensive cancer sequencing datasets. Its incorporation of VAF constraints directly aligns with the biological realities of cancer progression and sample heterogeneity, facilitating enhanced understanding and potential development of targeted cancer therapies.

Future developments could extend LICHeE's capabilities to accommodate lower-coverage sequencing data, directly incorporate aneuploidies and larger CNVs, and refine the method to identify and analyze cancer evolution using even more comprehensive genomic landscapes. As cancer genomic research continues to expand, methods like LICHeE will be crucial in decoding the intricate tapestry of cancer evolution and improving therapeutic strategies.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub