Towards DNA-Encoded Library Generation with GFlowNets (2404.10094v1)
Abstract: DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.
- 2P2Idb: a structural database dedicated to orthosteric modulation of protein–protein interactions. Nucleic acids research, 41(D1):D824–D827, 2012.
- Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 34:27381–27394, 2021.
- GFlowNet foundations. Journal of Machine Learning Research, 24(210):1–55, 2023.
- Fr-PPIChem: An academic compound library dedicated to protein–protein interactions. ACS chemical biology, 15(6):1566–1574, 2020.
- DNA-encoded chemical libraries: a comprehensive review with succesful stories and future challenges. ACS Pharmacology & Translational Science, 4(4):1265–1279, 2021.
- DNA-encoded chemistry: enabling the deeper sampling of chemical space. Nature Reviews Drug Discovery, 16(2):131–147, 2017.
- GFlowNets for AI-driven scientific discovery. Digital Discovery, 2(3):557–577, 2023.
- Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. Journal of Chemical Information and Modeling, 62(10):2316–2331, 2022.
- Trajectory balance: Improved credit assignment in GFlowNets. Advances in Neural Information Processing Systems, 35:5955–5967, 2022.
- Machine learning on DNA-encoded libraries: a new paradigm for hit finding. Journal of Medicinal Chemistry, 63(16):8857–8866, 2020.
- Crystal-GFN: sampling crystals with desirable properties and constraints. arXiv preprint arXiv:2310.04925, 2023.
- Chemical and structural lessons from recent successes in protein–protein interaction inhibition (2P2I). Current opinion in chemical biology, 15(4):475–481, 2011.
- The Metropolis—Hastings algorithm. Monte Carlo statistical methods, pp. 267–320, 2004.
- Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
- DNA-encoded chemical libraries. Nature Reviews Methods Primers, 2(1):3, 2022.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- ZINC 15–ligand discovery for everyone. Journal of chemical information and modeling, 55(11):2324–2337, 2015.
- Towards equilibrium molecular conformation generation with GFlowNets. arXiv preprint arXiv:2310.14782, 2023.
- Molformer: Motif-based transformer on 3D heterogeneous molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 5312–5320, 2023.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018a.
- Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pp. 5453–5462. PMLR, 2018b.