Evolutionary Retrosynthetic Route Planning (2310.05186v2)
Abstract: Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, AI based retrosynthesis is attracting more attention and has become a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, this paper proposes a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products and compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreases by an average of 83.9%, and the number of feasible search routes increases by 1.38 times. The source code is available at https://github.com/ilog-ecnu/EvoRRP.
- M. H. Segler, M. Preuss, and M. P. Waller, “Planning chemical syntheses with deep neural networks and symbolic ai,” Nature, vol. 555, no. 7698, pp. 604–610, 2018.
- C. W. Coley, D. A. Thomas III, J. A. Lummiss, J. N. Jaworski, C. P. Breen, V. Schultz, T. Hart, J. S. Fishman, L. Rogers, H. Gao et al., “A robotic platform for flow synthesis of organic compounds informed by ai planning,” Science, vol. 365, no. 6453, p. eaax1566, 2019.
- K. Lin, Y. Xu, J. Pei, and L. Lai, “Automatic retrosynthetic route planning using template-free models,” Chemical Science, vol. 11, no. 12, pp. 3355–3364, 2020.
- M. Koch, T. Duigou, and J.-L. Faulon, “Reinforcement learning for bioretrosynthesis,” ACS Synthetic Biology, vol. 9, no. 1, pp. 157–168, 2019.
- G. M. B. Chaslot, M. H. Winands, and H. J. van Den Herik, “Parallel monte-carlo tree search,” in Computers and Games: 6th International Conference. Springer, 2008, pp. 60–71.
- H. Dai, C. Li, C. Coley, B. Dai, and L. Song, “Retrosynthesis prediction with conditional graph logic network,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- R. Sun, H. Dai, L. Li, S. Kearnes, and B. Dai, “Towards understanding retrosynthesis by energy-based models,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 186–10 194, 2021.
- C. Yan, P. Zhao, C. Lu, Y. Yu, and J. Huang, “Retrocomposer: Composing templates for template-based retrosynthesis prediction,” Biomolecules, vol. 12, no. 9, p. 1325, 2022.
- S. Zheng, J. Rao, Z. Zhang, J. Xu, and Y. Yang, “Predicting retrosynthetic reactions using self-corrected transformer neural networks,” Journal of Chemical Information and Modeling, vol. 60, no. 1, pp. 47–55, 2019.
- B. Chen, T. Shen, T. S. Jaakkola, and R. Barzilay, “Learning to make generalizable and diverse predictions for retrosynthesis,” arXiv preprint arXiv:1910.09688, 2019.
- I. V. Tetko, P. Karpov, R. Van Deursen, and G. Godin, “State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis,” Nature Communications, vol. 11, no. 1, p. 5575, 2020.
- C. Shi, M. Xu, H. Guo, M. Zhang, and J. Tang, “A graph to graphs framework for retrosynthesis prediction,” in International Conference on Machine Learning. PMLR, 2020, pp. 8818–8827.
- C. Yan, Q. Ding, P. Zhao, S. Zheng, J. Yang, Y. Yu, and J. Huang, “Retroxpert: Decompose retrosynthesis prediction like a chemist,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 248–11 258, 2020.
- V. R. Somnath, C. Bunne, C. Coley, A. Krause, and R. Barzilay, “Learning graph models for retrosynthesis prediction,” Advances in Neural Information Processing Systems, vol. 34, pp. 9405–9415, 2021.
- M. E. Fortunato, C. W. Coley, B. C. Barnes, and K. F. Jensen, “Data augmentation and pretraining for template-based retrosynthetic prediction in computer-aided synthesis planning,” Journal of Chemical Information and Modeling, vol. 60, no. 7, pp. 3398–3407, 2020.
- M. H. Segler and M. P. Waller, “Neural-symbolic machine learning for retrosynthesis and reaction prediction,” Chemistry–A European Journal, vol. 23, no. 25, pp. 5966–5971, 2017.
- C. W. Coley, L. Rogers, W. H. Green, and K. F. Jensen, “Computer-assisted retrosynthesis based on molecular similarity,” ACS Central Science, vol. 3, no. 12, pp. 1237–1245, 2017.
- S. Chen and Y. Jung, “Deep retrosynthetic reaction prediction using local reactivity and global attention,” JACS Au, vol. 1, no. 10, pp. 1612–1620, 2021.
- Z. Tu and C. W. Coley, “Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction,” Journal of Chemical Information and Modeling, vol. 62, no. 15, pp. 3503–3513, 2022.
- Y. Wan, C.-Y. Hsieh, B. Liao, and S. Zhang, “Retroformer: Pushing the limits of end-to-end retrosynthesis transformer,” in International Conference on Machine Learning. PMLR, 2022, pp. 22 475–22 490.
- X. Wang, Y. Li, J. Qiu, G. Chen, H. Liu, B. Liao, C.-Y. Hsieh, and X. Yao, “Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions,” Chemical Engineering Journal, vol. 420, p. 129845, 2021.
- S.-W. Seo, Y. Y. Song, J. Y. Yang, S. Bae, H. Lee, J. Shin, S. J. Hwang, and E. Yang, “Gta: Graph truncated attention for retrosynthesis,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 531–539.
- M. Sacha, M. Błaz, P. Byrski, P. Dabrowski-Tumanski, M. Chrominski, R. Loska, P. Włodarczyk-Pruszynski, and S. Jastrzebski, “Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits,” Journal of Chemical Information and Modeling, vol. 61, no. 7, pp. 3273–3284, 2021.
- Y. Wang, C. Pang, Y. Wang, Y. Jiang, J. Jin, S. Liang, Q. Zou, and L. Wei, “Mechretro is a chemical-mechanism-driven graph learning framework for interpretable retrosynthesis prediction and pathway planning,” arXiv preprint arXiv:2210.02630, 2022.
- K. Mao, X. Xiao, T. Xu, Y. Rong, J. Huang, and P. Zhao, “Molecular graph enhanced transformer for retrosynthesis prediction,” Neurocomputing, vol. 457, pp. 193–202, 2021.
- A. Heifets and I. Jurisica, “Construction of new medicines via game proof search,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 26, no. 1, 2012, pp. 1564–1570.
- T. Klucznik, B. Mikulak-Klucznik, M. P. McCormack, H. Lima, S. Szymkuć, M. Bhowmick, K. Molga, Y. Zhou, L. Rickershauser, E. P. Gajewska et al., “Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory,” Chem, vol. 4, no. 3, pp. 522–532, 2018.
- B. Chen, C. Li, H. Dai, and L. Song, “Retro*: learning retrosynthetic planning with neural guided a* search,” in International Conference on Machine Learning. PMLR, 2020, pp. 1608–1616.
- X. Wang, Y. Qian, H. Gao, C. W. Coley, Y. Mo, R. Barzilay, and K. F. Jensen, “Towards efficient discovery of green synthetic pathways with monte carlo tree search and reinforcement learning,” Chemical Science, vol. 11, no. 40, pp. 10 959–10 972, 2020.
- R. Storn and K. Price, “Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces,” Journal of Global Optimization, vol. 11, no. 4, p. 341, 1997.
- J. Kennedy and R. Everhart, “A new optimizer using particle swarm theory. in proceedings of the sixth international symposium on micro machine and human science. nagoya japón,” IEEE Service Center Piscataway, NJ, 1995.
- X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,” IEEE Transactions on Evolutionary Computation, vol. 3, no. 2, pp. 82–102, 1999.
- J. R. Koza, “Genetic programming as a means for programming computers by natural selection,” Statistics and Computing, vol. 4, pp. 87–112, 1994.
- H. Mühlenbein and G. Paass, “From recombination of genes to the estimation of distributions i. binary parameters,” in Parallel Problem Solving from Nature. Springer, 1996, pp. 178–187.
- J. De Bonet, C. Isbell, and P. Viola, “Mimic: Finding optima by estimating probability densities,” Advances in Neural Information Processing Systems, vol. 9, 1996.
- D. Heckerman, D. Geiger, and D. M. Chickering, “Learning bayesian networks: The combination of knowledge and statistical data,” Machine Learning, vol. 20, pp. 197–243, 1995.
- R. Etxeberria, “Global optimization using bayesian networks,” in Proc. 2nd Symposium on Artificial Intelligence (CIMAF-99), 1999.
- M. Pelikan, K. Sastry, and D. E. Goldberg, “Multiobjective hboa, clustering, and scalability,” in Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, 2005, pp. 663–670.
- R. Santana, P. Larranaga, and J. A. Lozano, “Learning factorizations in estimation of distribution algorithms using affinity propagation,” Evolutionary Computation, vol. 18, no. 4, pp. 515–546, 2010.
- M. Laumanns and J. Ocenasek, “Bayesian optimization algorithms for multi-objective optimization,” in Parallel Problem Solving from Nature. Springer, 2002, pp. 298–307.
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
- Q. Zhang, A. Zhou, and Y. Jin, “Rm-meda: A regularity model-based multiobjective estimation of distribution algorithm,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 1, pp. 41–63, 2008.
- Y. Jin and B. Sendhoff, “Connectedness, regularity and the success of local search in evolutionary multi-objective optimization,” in The 2003 Congress on Evolutionary Computation, 2003. CEC’03., vol. 3. IEEE, 2003, pp. 1910–1917.
- S. Baluja, “Population-based incremental learning. a method for integrating genetic search based function optimization and competitive learning,” Carnegie-Mellon Univ Pittsburgh Pa Dept Of Computer Science, Tech. Rep., 1994.
- M. Pelikan, D. E. Goldberg, and E. Cantu-Paz, “Linkage problem, distribution estimation, and bayesian networks,” Evolutionary Computation, vol. 8, no. 3, pp. 311–340, 2000.
- M. Pelikan and H. Mühlenbein, “The bivariate marginal distribution algorithm,” in Advances in Soft Computing: Engineering Design and Manufacturing. Springer, 1999, pp. 521–535.
- P. A. Bosman and D. Thierens, “Multi-objective optimization with the naive midea,” Studies in Fuzziness and Soft Computing, vol. 192, p. 123, 2006.
- J. Ocenasek, S. Kern, N. Hansen, and P. Koumoutsakos, “A mixed bayesian optimization algorithm with variance adaptation,” in Parallel Problem Solving from Nature. Springer, 2004, pp. 352–361.
- L. Zhang, H. Yang, S. Yang, and X. Zhang, “A macro-micro population-based co-evolutionary multi-objective algorithm for community detection in complex networks [research frontier],” IEEE Computational Intelligence Magazine, vol. 18, no. 3, pp. 69–86, 2023.
- T. Back, M. Emmerich, and O. Shir, “Evolutionary algorithms for real world applications [application notes],” IEEE Computational Intelligence Magazine, vol. 3, no. 1, pp. 64–67, 2008.
- L. Weber, “Multi-component reactions and evolutionary chemistry,” Drug Discovery Today, vol. 7, pp. 143–147, 2002.
- D. S. Wigh, J. M. Goodman, and A. A. Lapkin, “A review of molecular representation in the age of machine learning,” Computational Molecular Science, vol. 12, no. 5, p. e1603, 2022.
- A. Mullard et al., “The drug-maker’s guide to the galaxy,” Nature, vol. 549, no. 7673, pp. 445–447, 2017.
- A. Cereto-Massagué, M. J. Ojeda, C. Valls, M. Mulero, S. Garcia-Vallvé, and G. Pujadas, “Molecular fingerprint similarity search in virtual screening,” Methods, vol. 71, pp. 58–63, 2015.
- G. Landrum et al., “Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling,” Greg Landrum, vol. 8, 2013.
- A. Zhou, J. Sun, and Q. Zhang, “An estimation of distribution algorithm with cheap and expensive local search methods,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 6, pp. 807–822, 2015.
- B. Liu, B. Ramsundar, P. Kawthekar, J. Shi, J. Gomes, Q. Luu Nguyen, S. Ho, J. Sloane, P. Wender, and V. Pande, “Retrosynthetic reaction prediction using neural sequence-to-sequence models,” ACS Central Science, vol. 3, no. 10, pp. 1103–1113, 2017.
- C. W. Coley, L. Rogers, W. H. Green, and K. F. Jensen, “Scscore: synthetic complexity learned from a reaction corpus,” Journal of Chemical Information and Modeling, vol. 58, no. 2, pp. 252–261, 2018.
- A. Lipowski and D. Lipowska, “Roulette-wheel selection via stochastic acceptance,” Physica A: Statistical Mechanics and its Applications, vol. 391, no. 6, pp. 2193–2196, 2012.
- C. Li, X. Deng, W. Zhang, X. Xie, M. Conrad, Y. Liu, J. P. F. Angeli, and L. Lai, “Novel allosteric activators for ferroptosis regulator glutathione peroxidase 4,” Journal of Medicinal Chemistry, vol. 62, no. 1, pp. 266–275, 2018.