Can LLMs predict experimental outcomes

Determine whether large language models (LLMs) trained on general text and scientific articles can predict the outcomes of scientific experiments.

Background

The paper motivates the need to assess predictive capabilities of LLMs by noting the exponential growth of the scientific literature and the challenge for human experts to integrate vast, noisy, and diverse findings. The authors propose BrainBench, a forward-looking benchmark designed to evaluate whether LLMs can predict neuroscience results from paper abstracts.

This open question frames the paper’s core investigation: if LLMs trained on general text and scientific articles can accurately forecast experimental outcomes, it would have major implications for scientific practice. The authors subsequently test multiple LLMs, including a neuroscience-specialized variant (BrainGPT), against human experts on BrainBench to empirically evaluate this capability.

References

It is an open question whether LLMs, trained on general text and scientific articles, can predict the outcomes of experiments.

— Large language models surpass human experts in predicting neuroscience results (2403.03230 - Luo et al., 4 Mar 2024) in Introduction

Can LLMs predict experimental outcomes

Background

References

Related Problems