Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery (2405.09783v1)

Published 16 May 2024 in cs.LG, cs.AI, and cs.CE

Abstract: LLMs have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis.

LLM and Simulation as Bilevel Optimizers: A New Approach to Physical Scientific Discovery

Background and Motivation

Scientific discovery often mimics a human approach: propose hypotheses, conduct experiments, and refine theories based on observations. This paper takes inspiration from this process and attempts to automate it using a combination of LLMs and physical simulations. The aim is to create a unified, universally applicable framework called the Scientific Generative Agent (SGA) that blends the abstract reasoning power of LLMs with the computational robustness of simulations.

What is the Scientific Generative Agent (SGA)?

At its core, SGA is a bilevel optimization framework comprising two layers:

  1. Outer-Level Optimization: Here, LLMs act like experienced researchers, generating scientific hypotheses and refining them iteratively.
  2. Inner-Level Optimization: Physical simulations serve as the experimental platform, providing observational feedback and optimizing parameters through differentiability.

A practical example highlighted in the paper involves constitutive law discovery—a task where the aim is to find out the mathematical laws governing material behavior based on observed data.

How Does It Work?

Bilevel Optimization Pipeline

  • Input: An initial guess of a physical model (e.g., an elasticity model for a material).
  • Outer-Level Optimization: LLMs generate new hypotheses based on previously proposed solutions, altering both discrete components (like equations) and continuous ones (like material constants).
  • Inner-Level Optimization: These hypotheses are simulated to provide feedback and further optimize the continuous parameters.

The optimization process iterates through these steps, balancing the need for exploitation (refining known good solutions) and exploration (trying out novel ideas).

Experimental Setup

Constitutive Law Discovery

Here, the goal is to identify both the form (discrete) and the characteristics (continuous parameters) of the material model from observational data. The authors used material point methods and differentiable simulations to achieve this.

Molecular Design

In this task, the objective is to discover molecular structures with specific quantum mechanical properties. The framework generates both the molecular structure and the 3D coordinates of the atoms, refining them through iterative optimization.

Results

The empirical studies cover eight tasks, spanning both constitutive law discovery and molecular design. Some key findings are:

  • Constitutive Law Search: The proposed method significantly outperforms existing LLM-driven baselines in discovering accurate constitutive laws for materials.
  • Molecular Design: The SGA framework also excels in designing molecules with targeted quantum mechanical properties, often producing solutions that defy conventional expectations but hold up under expert scrutiny.

Strong Numerical Results: The paper provides detailed benchmark results. For example, in constitutive law discovery, the best solution achieved losses of $5.2e-5$ versus baseline losses reaching $298.5$ in some tasks.

Implications and Future Directions

Theoretical Implications

The approach highlights the utility of combining LLMs, which excel in abstract reasoning, with simulations that provide quantitative feedback. This could pave the way for more generalized AI frameworks capable of conducting complex scientific inquiries across various fields.

Practical Implications

For scientific and engineering domains, this means potentially faster discovery and refinement of new materials, medicines, and more. The integration of LLMs and simulations can democratize access to advanced research capabilities, leveling the playing field for smaller research institutions.

Future Work

Future research could focus on improving the interpretability and safety of LLM-generated solutions. The cost and efficiency of LLM inference at scale also present challenges that need addressing. Moreover, incorporating human feedback into the optimization process could further refine results and expand the scope of applicability.

Conclusion

The Scientific Generative Agent introduces a novel way to harness the strengths of LLMs and simulations for scientific discovery. By emulating the meticulous and iterative approach of human researchers, this bilevel optimization framework shows significant promise in discovering new scientific knowledge, outperforming traditional and LLM-based baselines in various challenging tasks. As the field progresses, integrating more domain-specific knowledge and addressing practical constraints will be crucial steps toward making this approach a standard tool in scientific research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. The impact of large language models on scientific discovery: a preliminary study using gpt-4. arXiv preprint arXiv:2311.07361, 2023.
  2. Anthropic. Introducing the next generation of claude, 2024. URL https://www.anthropic.com/news/claude-3-family.
  3. Multiple regression genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp.  879–886, 2014.
  4. Understanding and simplifying one-shot architecture search. In International conference on machine learning, pp. 550–559. PMLR, 2018.
  5. Neural symbolic regression that scales. In International Conference on Machine Learning, pp. 936–945. Pmlr, 2021.
  6. Autonomous chemical research with large language models. Nature, 624(7992):570–578, 2023.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations, 2019.
  9. Large language models as tool makers. In International Conference on Learning Representations, 2024.
  10. Learning concise representations for regression by evolving networks of trees. In International Conference on Learning Representations, 2019.
  11. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp.  785–794, 2016.
  12. Chemberta: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885, 2020.
  13. An overview of bilevel optimization. Annals of operations research, 153:235–256, 2007.
  14. Diffpd: Differentiable projective dynamics. ACM Transactions on Graphics (TOG), 41(2):1–21, 2021.
  15. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
  16. Science of science. Science, 359(6379):eaao0185, 2018.
  17. Halgren, T. A. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of computational chemistry, 17(5-6):490–519, 1996.
  18. Hansen, N. The cma evolution strategy: a comparing review. Towards a new evolutionary computation: Advances in the estimation of distribution algorithms, pp.  75–102, 2006.
  19. Benchmarking large language models as ai research agents. arXiv preprint arXiv:2310.03302, 2023.
  20. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  21. The material point method for simulating continuum materials. In Acm siggraph 2016 courses, pp.  1–52. 2016.
  22. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp. 2323–2332. PMLR, 2018.
  23. Bayesian symbolic regression. arXiv preprint arXiv:1910.08892, 2019.
  24. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
  25. Adam: A method for stochastic optimization. In International Conference on Learning Representations, San Diega, CA, USA, 2015.
  26. Parameter identification for symbolic regression using nonlinear least squares. Genetic Programming and Evolvable Machines, 21(3):471–501, 2020.
  27. Automated scientific discovery: From equation discovery to autonomous discovery systems. arXiv preprint arXiv:2305.02251, 2023.
  28. A probabilistic and multi-objective analysis of lexicase selection and ε𝜀\varepsilonitalic_ε-lexicase selection. Evolutionary Computation, 27(3):377–402, 2019.
  29. Contemporary symbolic regression methods and their relative performance. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  30. Landrum, G. et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8:31, 2013.
  31. Empowering molecule discovery for molecule-caption translation with large language models: A chatgpt perspective. arXiv preprint arXiv:2306.06615, 2023.
  32. Transformer-based model for symbolic regression via joint supervised learning. In The Eleventh International Conference on Learning Representations, 2022.
  33. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019.
  34. A general descent aggregation framework for gradient-based bi-level optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):38–57, 2022.
  35. Ai-based language models powering drug discovery and development. Drug Discovery Today, 26(11):2593–2607, 2021.
  36. Risp: Rendering-invariant state predictor with differentiable simulation and rendering for cross-domain parameter estimation. In International Conference on Learning Representations, 2021.
  37. Learning neural constitutive laws from motion observations for generalizable pde dynamics. In International Conference on Machine Learning. PMLR, 2023.
  38. Eureka: Human-level reward design via coding large language models. In International Conference on Learning Representations, 2024.
  39. Macklin, M. Warp: A high-performance python framework for gpu simulation and graphics, March 2022. NVIDIA GPU Technology Conference.
  40. McConaghy, T. Ffx: Fast, scalable, deterministic symbolic regression technology. Genetic Programming Theory and Practice IX, pp.  235–260, 2011.
  41. Symbolic regression via neural-guided genetic programming population seeding. In Advances in Neural Information Processing Systems, 2021.
  42. OpenAI. OpenAI: Introducing ChatGPT, 2022. URL https://openai.com/blog/chatgpt.
  43. OpenAI. OpenAI: GPT-4, 2023. URL https://openai.com/research/gpt-4.
  44. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  45. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  46. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations, 2020.
  47. Popper, K. The logic of scientific discovery. Routledge, 2005.
  48. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
  49. Better informed distance geometry: using what we know to improve conformation generation. Journal of chemical information and modeling, 55(12):2562–2574, 2015.
  50. Mathematical discoveries from program search with large language models. Nature, pp.  1–3, 2023.
  51. Philosophy of science: A contemporary introduction. Routledge, 2019.
  52. Schapire, R. E. The boosting approach to machine learning: An overview. Nonlinear estimation and classification, pp.  149–171, 2003.
  53. Schneider, G. Automating drug discovery. Nature reviews drug discovery, 17(2):97–113, 2018.
  54. Chatgpt in drug discovery. 2023.
  55. Evolutionary algorithm for bilevel optimization using approximations of the lower level optimal solution mapping. European Journal of Operational Research, 257(2):395–411, 2017a.
  56. A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 22(2):276–295, 2017b.
  57. Application of a particle-in-cell method to solid mechanics. Computer physics communications, 87(1-2):236–252, 1995.
  58. Cognitive architectures for language agents. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. Survey Certification.
  59. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024.
  60. Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Advances in Neural Information Processing Systems, 33:4860–4871, 2020.
  61. Symbolicgpt: A generative transformer model for symbolic regression. arXiv preprint arXiv:2106.14131, 2021.
  62. Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In Proceedings of the genetic and evolutionary computation conference, pp.  1084–1092, 2019.
  63. Improving model-based genetic programming for symbolic regression of small expressions. Evolutionary computation, 29(2):211–237, 2021.
  64. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023.
  65. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  66. Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  67. A typology of scientific breakthroughs. Quantitative Science Studies, 1(3):1203–1222, 2020.
  68. Rethinking bi-level optimization in neural architecture search: A gibbs sampling perspective. In AAAI Conference on Artificial Intelligence, volume 35, pp. 10551–10559, 2021.
  69. Large language models as optimizers. In International Conference on Learning Representations, 2024.
  70. Tree of thoughts: Deliberate problem solving with large language models. In Conference on Neural Information Processing Systems, 2023a.
  71. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023b.
  72. Population-based de novo molecule generation, using grammatical evolution. Chemistry Letters, 47(11):1431–1434, 2018.
  73. Efficiently programming large language models using sglang. arXiv preprint arXiv:2312.07104, 2023.
  74. Uni-mol: A universal 3d molecular representation learning framework. In International Conference on Learning Representations, 2023.
  75. Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1):10752, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Pingchuan Ma (90 papers)
  2. Tsun-Hsuan Wang (37 papers)
  3. Minghao Guo (45 papers)
  4. Zhiqing Sun (35 papers)
  5. Joshua B. Tenenbaum (257 papers)
  6. Daniela Rus (181 papers)
  7. Chuang Gan (195 papers)
  8. Wojciech Matusik (76 papers)
Citations (10)