Dice Question Streamline Icon: https://streamlinehq.com

Validity of the hierarchical continuity assumption in experimental scRNA-seq data

Determine to what extent the hierarchical continuity assumption—namely, that mean expression vectors of parent and child nodes in a lineage tree are close in Euclidean space, as operationalized by the continuity loss in the hierarchical k-means (h-k-means) and hierarchical Gaussian mixture model (h-GMM)—holds in experimental single-cell RNA-seq datasets, where differentiation may not be strictly hierarchical and the provided label hierarchy is manually curated.

Information Square Streamline Icon: https://streamlinehq.com

Background

The proposed methods (h-k-means and h-GMM) incorporate a continuity loss based on a developmental biology assumption that gene expression changes smoothly along a lineage tree, implying small differences between the mean expression vectors of parent and child nodes. This assumption appears satisfied in simulated datasets specifically designed to follow hierarchical differentiation dynamics.

However, the paper notes uncertainty about how well this assumption holds in real experimental datasets, where differentiation may deviate from a strict hierarchy and the lineage prior may be manually curated. Clarifying this is critical for assessing when hierarchical regularization improves performance in practice and for guiding the applicability of the proposed models to experimental scRNA-seq data.

References

While our assumption on the distribution of data in the nodes of the hierarchy is obviously satisfied for simulated data, it is not clear how this assumption holds in experimental datasets as the differentiation process might not always lead to a hierarchy and the hierarchical prior is usually established manually and might therefore be less reliable .

Hierarchical novel class discovery for single-cell transcriptomic profiles (2409.05937 - Senoussi et al., 9 Sep 2024) in Experiments, Subsection Results (paragraph discussing simulated vs. experimental datasets)