Dice Question Streamline Icon: https://streamlinehq.com

Efficient detection of generalized bibubbles in bidirected graphs

Develop an efficient algorithm to identify all generalized bibubbles in a bidirected gene graph—i.e., minimal pairs of oriented genes (x, y) whose enclosed vertex set satisfies the specified reachability symmetry and minimality conditions—such that the method scales to very large graphs with tens of millions of nodes, including minigraph-cactus and PGGB graphs.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper introduces a rigorous definition of generalized bibubbles to capture local gene order, copy-number, and orientation variations directly on bidirected gene graphs. Existing superbubble-finding algorithms apply to directed graphs and do not generalize cleanly to bidirected graphs, especially in the presence of inversions.

The authors present practical methods based on net graphs and cycle equivalence and implement a working approach that performs well for human-scale gene graphs (~20,000 genes). However, they note that this implementation will not scale to much larger graphs produced by whole-genome pangenome tools like minigraph-cactus or PGGB, leaving scalability as an unresolved challenge.

References

On the theoretical side, this article presented a rigorous definition of “bubble” in a bidirected graph but it did not find an efficient algorithm to identify such generalized bibubbles. While the current implementation in pangene works for gene graphs containing ∼20,000 genes, it will be slow for a minigraph-cactus or PGGB graph that contains tens of millions of nodes. How to efficiently identify generalized bibubbles remains an open and critical problem.

Exploring gene content with pangene graphs (2402.16185 - Li et al., 25 Feb 2024) in Discussions