Search Strategies for Self-driving Laboratories with Pending Experiments (2312.03466v1)

Published 6 Dec 2023 in cs.LG and cs.RO

Abstract: Self-driving laboratories (SDLs) consist of multiple stations that perform material synthesis and characterisation tasks. To minimize station downtime and maximize experimental throughput, it is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages. Asynchronous parallelization of experiments, however, introduces delayed feedback (i.e. "pending experiments"), which is known to reduce Bayesian optimiser performance. Here, we build a simulator for a multi-stage SDL and compare optimisation strategies for dealing with delayed feedback and asynchronous parallelized operation. Using data from a real SDL, we build a ground truth Bayesian optimisation simulator from 177 previously run experiments for maximizing the conductivity of functional coatings. We then compare search strategies such as expected improvement, noisy expected improvement, 4-mode exploration and random sampling. We evaluate their performance in terms of amount of delay and problem dimensionality. Our simulation results showcase the trade-off between the asynchronous parallel operation and delayed feedback.

Summary

The paper demonstrates that delayed feedback from asynchronous operations increases cumulative regret and hampers Bayesian optimization performance.
It compares optimization methods such as EI, qNEI, and mode cycling, which consistently outperform random sampling on noisy test functions.
The study highlights a key trade-off between faster experimental throughput and optimization accuracy, guiding future improvements in real-world SDLs.

Introduction

Self-driving laboratories (SDLs) are automated systems that significantly expedite the discovery process for new materials by performing various synthesis and characterization tasks. These SDLs have multiple stages and can operate in two modes: serial, where one experiment is conducted at a time, and asynchronous parallel, where multiple experiments run simultaneously in different stages, thus reducing station downtime and increasing experimental throughput. However, asynchronous operation introduces delayed feedback from pending experiments, which can undermine the efficiency of the optimization algorithms used to determine the best experimental conditions.

Setup and Experiments

To address the challenge posed by delayed feedback, a simulation of a multi-stage SDL was built using Bayesian optimization. This simulation aimed to compare the effectiveness of four optimization strategies: random sampling, Expected Improvement (EI), Noisy Expected Improvement (qNEI), and Mode cycling, which combines upper confidence bound (UCB) acquisition with space-filling designs. The simulated experiments were conducted on three test functions, each with added noise to represent the variability seen in real-world experiments. These functions include the Ackley and Levy test benchmarks, as well as an SDL test function derived from data related to functional coating conductivity.

Results & Discussion

The simulation results confirm that delayed feedback negatively impacts the performance of the SDL models. The comparison of optimization strategies shows that, although EI, qNEI, and Mode cycling strategies all perform comparably well in minimizing cumulative regret (a measure of optimizer performance), there is a clear trade-off between the efficiency gain from running experiments in parallel and the performance deterioration due to delayed feedback. As delay increases, the cumulative regret likewise increases. Moreover, increasing the problem's dimensionality tends to augment cumulative regret, with a lesser effect from increasing the number of parallel stages (i.e., delay).

Conclusion

This paper reinforces the concept that running SDL experiments in parallel to increase throughput comes at a cost to the performance of Bayesian optimization algorithms. The simulations help elucidate the balance between achieving faster global optimum discovery and increasing experimental throughput. While none of the tested optimization strategies significantly outperformed the others in the presence of delay, these strategies still exceeded the effectiveness of random sampling. Future research could focus on implementing these findings in real-world SDLs and refining search strategies to further enhance their practical application.

PDF Markdown