- The paper demonstrates that delayed feedback from asynchronous operations increases cumulative regret and hampers Bayesian optimization performance.
- It compares optimization methods such as EI, qNEI, and mode cycling, which consistently outperform random sampling on noisy test functions.
- The study highlights a key trade-off between faster experimental throughput and optimization accuracy, guiding future improvements in real-world SDLs.
Introduction
Self-driving laboratories (SDLs) are automated systems that significantly expedite the discovery process for new materials by performing various synthesis and characterization tasks. These SDLs have multiple stages and can operate in two modes: serial, where one experiment is conducted at a time, and asynchronous parallel, where multiple experiments run simultaneously in different stages, thus reducing station downtime and increasing experimental throughput. However, asynchronous operation introduces delayed feedback from pending experiments, which can undermine the efficiency of the optimization algorithms used to determine the best experimental conditions.
Setup and Experiments
To address the challenge posed by delayed feedback, a simulation of a multi-stage SDL was built using Bayesian optimization. This simulation aimed to compare the effectiveness of four optimization strategies: random sampling, Expected Improvement (EI), Noisy Expected Improvement (qNEI), and Mode cycling, which combines upper confidence bound (UCB) acquisition with space-filling designs. The simulated experiments were conducted on three test functions, each with added noise to represent the variability seen in real-world experiments. These functions include the Ackley and Levy test benchmarks, as well as an SDL test function derived from data related to functional coating conductivity.
Results & Discussion
The simulation results confirm that delayed feedback negatively impacts the performance of the SDL models. The comparison of optimization strategies shows that, although EI, qNEI, and Mode cycling strategies all perform comparably well in minimizing cumulative regret (a measure of optimizer performance), there is a clear trade-off between the efficiency gain from running experiments in parallel and the performance deterioration due to delayed feedback. As delay increases, the cumulative regret likewise increases. Moreover, increasing the problem's dimensionality tends to augment cumulative regret, with a lesser effect from increasing the number of parallel stages (i.e., delay).
Conclusion
This paper reinforces the concept that running SDL experiments in parallel to increase throughput comes at a cost to the performance of Bayesian optimization algorithms. The simulations help elucidate the balance between achieving faster global optimum discovery and increasing experimental throughput. While none of the tested optimization strategies significantly outperformed the others in the presence of delay, these strategies still exceeded the effectiveness of random sampling. Future research could focus on implementing these findings in real-world SDLs and refining search strategies to further enhance their practical application.