Solving the Tree Containment Problem Using Graph Neural Networks (2404.09812v2)
Abstract: Tree Containment is a fundamental problem in phylogenetics useful for verifying a proposed phylogenetic network, representing the evolutionary history of certain species. Tree Containment asks whether the given phylogenetic tree (for instance, constructed from a DNA fragment showing tree-like evolution) is contained in the given phylogenetic network. In the general case, this is an NP-complete problem. We propose to solve it approximately using Graph Neural Networks. In particular, we propose to combine the given network and the tree and apply a Graph Neural Network to this network-tree graph. This way, we achieve the capability of solving the tree containment instances representing a larger number of species than the instances contained in the training dataset (i.e., our algorithm has the inductive learning ability). Our algorithm demonstrates an accuracy of over $95\%$ in solving the tree containment problem on instances with up to 100 leaves.
- Simgnn: A neural network approach to fast graph similarity computation. In Proceedings of the twelfth ACM international Conference on Web Search and Data Mining, pp. 384–392, 2019.
- Learning-based efficient graph similarity computation via multi-scale convolutional set matching. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34:4, pp. 3219–3226, 2020.
- Networks: expanding evolutionary thinking. Trends in Genetics, 29(8):439–441, 2013.
- Exploratory combinatorial optimization with reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34:4, pp. 3243–3250, 2020.
- Robert G Beiko. Telling the whole story in a 10,000-genome world. Biology Direct, 6:1–36, 2011.
- Constructing phylogenetic networks via cherry picking and machine learning. Algorithms for Molecular Biology, 18(1):13, 2023.
- The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition, pp. 3121–3124. IEEE, 2010.
- Compatibility of unrooted phylogenetic trees is FPT. Theoretical Computer Science, 351(3):296–302, 2006.
- Combinatorial optimization and reasoning with graph neural networks. Journal of Machine Learning Research, 24(130):1–61, 2023.
- XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
- Interpretable graph similarity computation via differentiable optimal alignment of node embeddings. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665–674, 2021.
- Solving the tree containment problem in linear time for nearly stable phylogenetic networks. Discrete Applied Mathematics, 246:62–79, 2018.
- Neural message passing for quantum chemistry. In International Conference on Machine Learning, pp. 1263–1272. PMLR, 2017.
- Andreas DM Gunawan. Solving the tree containment problem for reticulation-visible networks in linear time. In Algorithms for Computational Biology: 5th International Conference, AlCoB 2018, Hong Kong, China, June 25–26, 2018, Proceedings 5, pp. 24–36. Springer, 2018.
- Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, 30, 2017.
- William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.
- Robbert Huijsman. Treewidth based algorithms for tree containment in phylogenetics. 2023. URL http://resolver.tudelft.nl/uuid:3906ebda-d667-4d3e-8bee-3f1f4df78387.
- Remie Janssen. Heading in the right direction? Using head moves to traverse phylogenetic network space. Journal of Graph Algorithms and Applications, 25:263–310, 01 2021.
- Linear time algorithm for tree-child network containment. In Algorithms for Computational Biology, pp. 93–107, Cham, 2020. Springer International Publishing.
- On cherry-picking and network containment. Theoretical Computer Science, 856:121–150, 2021.
- Seeing the trees and their branches in the network is hard. Theoretical Computer Science, 401(1-3):153–164, 2008.
- A survey on graph representation learning methods. ACM Transactions on Intelligent Systems and Technology, 15(1):1–55, 2024.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019.
- Combinatorial optimization with graph convolutional networks and guided tree search. Advances in Neural Information Processing Systems, 31, 2018.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
- Neural subgraph matching. arXiv preprint arXiv:2007.03092, 2020.
- Geophy: Differentiable phylogenetic inference via geometric gradients of tree topologies. In ICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators, 2023.
- Generation of level-k𝑘kitalic_k LGT networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(1):158–164, 2019.
- Combinatorial characterization of a certain class of words and a conjectured connection with general subclasses of phylogenetic tree-child networks. Scientific reports, 11(1):21875, 2021.
- Greed: A neural framework for learning graph distance functions. Advances in Neural Information Processing Systems, 35:22518–22530, 2022.
- Edge directionality improves learning on heterophilic graphs. arXiv preprint arXiv:2305.10498, 2023.
- A practical fixed-parameter algorithm for constructing tree-child networks from multiple binary trees. Algorithmica, 84(4):917–960, 2022.
- Embedding phylogenetic trees in networks of low treewidth. Discrete Mathematics & Theoretical Computer Science, 25(2), 2023.
- Graph attention networks. In International Conference on Learning Representations, 2018.
- Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pp. 5453–5462. PMLR, 2018.
- How powerful are graph neural networks? In International Conference on Learning Representations, 2019.
- Cheng Zhang. Learnable topological features for phylogenetic inference via graph neural networks. In International Conference on Learning Representations, 2022.
- Labeling trick: A theory of using graph neural networks for multi-node representation learning. Advances in Neural Information Processing Systems, 34:9061–9073, 2021a.
- Magnet: A neural network for directed graphs. Advances in Neural Information Processing Systems, 34:27003–27015, 2021b.
- H2mn: Graph similarity learning with hierarchical hypergraph matching networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2274–2284, 2021c.