Translation of LLM benchmark performance to novice physical laboratory performance
Determine whether strong performance by frontier large language models on biological knowledge and protocol benchmarks (e.g., Virology Capabilities Test and LAB-Bench) translates into improved novice human performance when executing hands-on procedures in physical biology laboratories, including multi-step workflows modeling viral reverse genetics.
References
Yet, whether this translates to improved human performance in the physical laboratory remains unclear.
— Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology
(2602.16703 - Hong et al., 18 Feb 2026) in Abstract