Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Fidelity Active Learning with GFlowNets (2306.11715v2)

Published 20 Jun 2023 in cs.LG and q-bio.BM

Abstract: In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, machine learning has progressed to become a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, structured and high-dimensional spaces. Moreover, the high fidelity, black-box objective function is often very expensive to evaluate. Progress in machine learning methods that can efficiently tackle such challenges would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates where multiple approximations of the black-box function are available at lower fidelity and cost. Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart while maintaining diversity, unlike RL-based alternatives. These results open new avenues for multi-fidelity active learning to accelerate scientific discovery and engineering design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. RDKit: Open-source cheminformatics. https://www.rdkit.org.
  2. Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science. APL Materials, 4(5):053208, 2016.
  3. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
  4. GFN2-xTB—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. Journal of Chemical Theory and Computation, 15(3):1652–1671, 2019.
  5. Machine learning guided aptamer refinement and discovery. Nature Communications, 12(1):2366, 2021.
  6. Flow network based generative models for non-iterative diverse candidate generation. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021.
  7. GFlowNet foundations. arXiv preprint arXiv:2111.09266, 2021.
  8. The art and practice of structure-based drug design: a molecular modeling perspective. Medicinal Research Reviews, 16(1):3–50, 1996.
  9. Machine learning for molecular and materials science. Nature, 559(7715):547–555, 2018.
  10. Bayesian experimental design: A review. Statistical Science, pages 273–304, 1995.
  11. Quality-diversity optimization: a novel branch of stochastic optimization. In Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, pages 109–135. Springer, 2021.
  12. Learning properties of ordered and disordered materials from multi-fidelity data. Nature Computational Science, 1(1):46–53, 2021.
  13. Fast greedy map inference for determinantal point process to improve recommendation diversity. arXiv preprint arXiv:1709.05135, 2018.
  14. Challenges and opportunities for nucleic acid therapeutics. Nucleic Acid Therapeutics, 32(1):8–13, 2022.
  15. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nature Biomedical Engineering, 5(6):613–623, 2021.
  16. Combining latent space and structured kernels for bayesian optimization over combinatorial spaces. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021.
  17. A multi-fidelity machine learning approach to high throughput materials screening. npj computational materials, 8(1):257, 2022.
  18. Active learning and Bayesian optimization: a unified perspective to learn with a goal. arXiv preprint arXiv:2303.01560, 2023.
  19. Combining multi-fidelity modelling and asynchronous batch Bayesian optimization. Computers & Chemical Engineering, 172:108194, 2023.
  20. Adam Evan Foster. Variational, Monte Carlo and policy-based approaches to Bayesian experimental design. PhD thesis, University of Oxford, 2021.
  21. Peter I. Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  22. Roman Garnett. Bayesian optimization. Cambridge University Press, 2023.
  23. Bayesian optimal active search and surveying. arXiv preprint arXiv:1206.6406, 2012.
  24. High-dimensional Bayesian optimisation with variational autoencoders and deep metric learning. arXiv preprint arXiv:2106.03609, 2021.
  25. Thomas A Halgren. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. Journal of Computational Chemistry, 17(5-6):490–519, 1996.
  26. Biological sequence design with GFlowNets. In International Conference on Machine Learning (ICML), volume 162. PMLR, 2022.
  27. GFlowNets for AI-driven scientific discovery. Digital Discovery, 2023.
  28. Multi-objective GFlowNets. In International Conference on Machine Learning (ICML), 2023.
  29. Cost effective active search. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  30. Efficient nonmyopic active search. In International Conference on Machine Learning (ICML), volume 70. PMLR, 2017.
  31. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, 2021.
  32. Multi-fidelity gaussian process bandit optimisation. Journal of Artificial Intelligence Research (JAIR), 66:151–196, 2019.
  33. E2EDNA: Simulation protocol for DNA aptamers with ligands. Journal of Chemical Information and Modeling, 61(9):4139–4144, 2021.
  34. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427(6971):247–252, 2004.
  35. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  36. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020.
  37. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics, 19:1–8, 2018.
  38. On-the-fly closed-loop materials discovery via Bayesian active learning. Nature Communications, 11(1):5966, 2020.
  39. A theory of continuous generative flow networks. In International Conference on Machine Learning (ICML), 2023.
  40. Deep multi-fidelity active learning of high-dimensional outputs. In International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151, pages 1694–1711. PMLR, 2022.
  41. Multi-fidelity bayesian optimization via deep neural networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
  42. Trajectory balance: Improved credit assignment in GFlowNets. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, 2022.
  43. GIBBON: General-purpose information-based bayesian optimisation. Journal of Machine Learning Research (JMLR), 22(235):1–49, 2021.
  44. Benchmark study of electrochemical redox potentials calculated with semiempirical and dft methods. The Journal of Physical Chemistry A, 124(35):7166–7176, 2020.
  45. Nonmyopic multifidelity active search. In International Conference on Machine Learning (ICML), volume 139. PMLR, 2021.
  46. Robert G Parr. Density functional theory of atoms and molecules. In Horizons of Quantum Chemistry: Proceedings of the Third International Congress of Quantum Chemistry Held at Kyoto, Japan, October 29-November 3, 1979, pages 5–15. Springer, 1980.
  47. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  48. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Review, 60(3):550–591, 2018.
  49. Jacob’s ladder of density functional approximations for the exchange-correlation energy. AIP Conference Proceedings, 577(1):1–20, 2001.
  50. Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2198), 2017.
  51. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Research, 49(D1):D288–D297, 2021.
  52. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of Computer-Aided Molecular Design, 27:675–679, 2013.
  53. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 11 2005.
  54. Herbert E. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527–535, 1952.
  55. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  56. Burr Settles. Active learning literature survey. Independent Technical Report, 2009.
  57. Density functional theory: a practical introduction. John Wiley & Sons, 2022.
  58. Engineering Design via Surrogate Modelling. Appendix: Example Problems, pages 195–203. John Wiley & Sons, Ltd, 2008.
  59. A general framework for multi-fidelity Bayesian optimization with Gaussian processes. In International Conference on Artificial Intelligence and Statistics, volume 89. PMLR, 2018.
  60. Gaussian process optimization in the bandit setting: No regret and experimental design. In International Conference on Machine Learning (ICML), 2010.
  61. Accelerating Bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning (ICML), volume 162. PMLR, 2022.
  62. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020.
  63. Multi-fidelity Bayesian optimization with max-value entropy search and its parallelization. In International Conference on Machine Learning (ICML), volume 119. PMLR, 2020.
  64. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3):189–191, 2022.
  65. Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient Bayesian optimization. In International Conference on Machine Learning (ICML), volume 70. PMLR, 2017.
  66. Active learning in the drug discovery process. In Advances in Neural Information Processing Systems (NeurIPS), volume 14, 2001.
  67. Deep kernel learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 370–378. PMLR, 2016.
  68. Practical multi-fidelity bayesian optimization for hyperparameter tuning. In Uncertainty in Artificial Intelligence Conference (UAI), volume 115, pages 788–798. PMLR, 2019.
  69. Accelerated search for materials with targeted properties by adaptive design. Nature Communications, 7(1):1–9, 2016.
  70. Computational design of three-dimensional RNA structure and function. Nature Nanotechnology, 14(9):866–873, 2019.
  71. Accelerated discovery of large electrostrains in BaTiO3-based piezoelectrics using active learning. Advanced Materials, 30(7):1702884, 2018.
  72. NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry, 32(1):170–173, 2011.
  73. Metal sensing by DNA. Chemical Reviews, 117(12):8272–8325, 2017.
  74. An introduction to electrocatalyst design using machine learning for renewable energy storage. arXiv preprint arXiv:2010.09435, 2020.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com