UniIF: Unified Molecule Inverse Folding (2405.18968v1)
Abstract: Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, such as RoseTTAFold All-Atom and AlphaFold3, we propose the unified model UniIF for the inverse folding of all molecules. We do such unification in two levels: 1) Data-Level: We propose a unified block graph data form for all molecules, including the local frame building and geometric feature initialization. 2) Model-Level: We introduce a geometric block attention network, comprising a geometric interaction, interactive attention and virtual long-term dependency modules, to capture the 3D interactions of all molecules. Through comprehensive evaluations across various tasks such as protein design, RNA design, and material design, we demonstrate that our proposed method surpasses state-of-the-art methods on all tasks. UniIF offers a versatile and effective solution for general molecule inverse folding.
- Accurate structure prediction of biomolecular interactions with alphafold 3. Nature, pages 1–3, 2024.
- On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2020.
- Design of rnas: comparing programs for inverse rna folding. Briefings in bioinformatics, 19(2):350–358, 2018.
- Robust deep learning based protein sequence design using proteinmpnn. bioRxiv, 2022.
- On over-squashing in message passing neural networks: The impact of width, depth, and topology. In International Conference on Machine Learning, pages 7865–7885. PMLR, 2023.
- Chili: Chemically-informed large-scale inorganic nanomaterials dataset for advancing graph machine learning. arXiv preprint arXiv:2402.13221, 2024.
- Graph u-nets. In international conference on machine learning, pages 2083–2092. PMLR, 2019.
- Kw-design: Pushing the limit of protein deign via knowledge refinement. In The Twelfth International Conference on Learning Representations, 2023.
- Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079, 2022.
- Pifold: Toward effective and efficient protein inverse folding. In International Conference on Learning Representations, 2023.
- Proteininvbench: Benchmarking protein inverse folding on diverse tasks, models, and metrics. Advances in Neural Information Processing Systems, 36, 2024.
- Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
- On the trade-off between over-smoothing and over-squashing in deep graph neural networks. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 566–576, 2023.
- Accelerating the prediction of stable materials with machine learning. Nature Computational Science, 3(11):934–945, 2023.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
- Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chemistry of Materials, 22(12):3762–3767, 2010.
- Learning inverse folding from millions of predicted structures. bioRxiv, 2022.
- Ribodiffusion: Tertiary structure-based rna inverse folding with generative diffusion models. bioRxiv, pages 2024–04, 2024.
- Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
- Equivariant pretrained transformer for unified geometric learning on multi-domain 3d molecules. arXiv preprint arXiv:2402.12714, 2024.
- Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411, 2020.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Generalist equivariant transformer towards 3d molecular interaction learning. arXiv preprint arXiv:2306.01474, 2023.
- Generalized biomolecular modeling and design with rosettafold all-atom. Science, 384(6693):eadl2528, 2024.
- Materials discovery and design using machine learning. Journal of Materiomics, 3(3):159–177, 2017.
- De novo protein design using geometric vector field networks. arXiv preprint arXiv:2310.11802, 2023.
- Combinatorial screening for new materials in unconstrained composition space with machine learning. Physical Review B, 89(9):094104, 2014.
- Revisiting over-smoothing and over-squashing using ollivier-ricci curvature. In International Conference on Machine Learning, pages 25956–25979. PMLR, 2023.
- Cath–a hierarchic classification of protein domain structures. Structure, 5(8):1093–1109, 1997.
- Prediction of stable nitride perovskites. Chemistry of Materials, 27(17):5957–5963, 2015.
- Fast and flexible protein design using deep graph neural networks. Cell Systems, 11(4):402–411, 2020.
- Generative de novo protein design with global context. arXiv preprint arXiv:2204.10673, 2022.
- Hierarchical data-efficient representation learning for tertiary structure-based rna design. In The Twelfth International Conference on Learning Representations, 2023.
- Graph attention networks. stat, 1050(20):10–48550, 2017.
- Predicting stable crystalline compounds using chemical similarity. npj Computational Materials, 7(1):12, 2021.
- Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog), 38(5):1–12, 2019.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Graph neural networks are inherently good generalizers: Insights by bridging gnns and mlps. arXiv preprint arXiv:2212.09034, 2022.
- Structure-informed language models are protein designers. bioRxiv, pages 2023–02, 2023.
Collections
Sign up for free to add this paper to one or more collections.