Bayesian Optimization for Scientific Discovery

Updated 7 October 2025

Bayesian optimization is a sequential decision-making approach that uses Gaussian process surrogates to efficiently navigate expensive evaluation spaces.
It employs acquisition functions such as Expected Improvement and Knowledge-Gradient to balance exploration and exploitation during experiments.
This method accelerates discoveries in fields like materials design by reducing the number of costly evaluations required for optimal outcomes.

Bayesian optimization for scientific discovery is a sequential decision-making methodology for maximizing (or minimizing) expensive-to-evaluate functions where each experimental or computational evaluation is resource-intensive. Its core paradigm centers on constructing a probabilistic surrogate model—most classically a Gaussian process (GP)—to represent prior beliefs about the unknown function. At each iteration, new experiments or evaluations are selected by optimizing an acquisition function, which quantifies the expected value of information gained by querying candidate points. This framework supports rapid convergence to optimal design solutions, balances exploration with exploitation, and enables data-efficient navigation of complex scientific spaces across materials, chemistry, and broader experimental domains.

1. Foundations of Bayesian Optimization

Bayesian optimization addresses the generic problem: $\max_{x \in A \subset \mathbb{R}^d} f(x)$ where $f(x)$ represents the quality of a design, $A$ is the feasible parameter space, and each function evaluation is costly. The Bayesian aspect enters by treating $f$ as a random function and maintaining a posterior distribution on $f$ conditioned on previously observed $(x_i, f(x_i))$ pairs. The surrogate model, denoted $\mathbb{E}_{n}[\,\cdot\,]$ after $n$ observations, captures both mean predictions $\mu_n(x)$ and predictive uncertainty $\sigma^2_n(x)$ at unmeasured $x$ .

The experiment selection process is formalized through maximization of acquisition functions derived by value-of-information principles. Rather than grid search or naive trial-and-error, Bayesian optimization targets new experiments to maximize the expected increase in utility, where utility is often quantified as improvement over the current best observation or anticipated reduction in uncertainty.

2. Gaussian Process Regression as Surrogate Model

Gaussian process regression (GPR) provides a highly flexible, nonparametric prior for the unknown function, defined by its mean function $\mu_0(x)$ (often constant) and covariance kernel $\Sigma_0(x, x')$ . The surrogate model for previously observed data $\{x_i, f(x_i)\}_{i=1}^n$ yields closed-form expressions for the posterior mean and variance at any $x^*$ : $\mu_n(x^*) = \mu_0(x^*) + \Sigma_0(x^*, x_{1:n})[\Sigma_0(x_{1:n}, x_{1:n})]^{-1}(f(x_{1:n}) - \mu_0(x_{1:n}))$

$\sigma^2_n(x^*) = \Sigma_0(x^*, x^*) - \Sigma_0(x^*, x_{1:n})[\Sigma_0(x_{1:n}, x_{1:n})]^{-1}\Sigma_0(x_{1:n}, x^*)$

When measurements are noisy with variance $\lambda^2$ , this is accommodated by adding $\lambda^2I_n$ to the covariance.

Kernel selection, e.g., squared-exponential or Matérn kernels, controls the presupposed smoothness/custom correlation structure in the latent function. Hyperparameters such as length scales and output variance are typically estimated via marginal likelihood maximization from observed data, supporting adaptive model refinement as additional data accrue.

3. Sequential Experimentation: Expected Improvement and Knowledge-Gradient

Experiment selection uses acquisition functions that quantify, for each candidate $x$ , the expected gain in objective value or knowledge:

Expected Improvement (EI):

For noiseless experiments, EI at $x$ is:

$\mathrm{EI}(x) = (\mu_n(x) - f^*_n)\Phi\left(\frac{\mu_n(x) - f^*_n}{\sigma_n(x)}\right) + \sigma_n(x)\varphi\left(\frac{\mu_n(x) - f^*_n}{\sigma_n(x)}\right)$

where $f^*_n$ is the best observed value, $\Phi(\cdot)$ and $\varphi(\cdot)$ are the cumulative distribution and probability density functions of the standard normal distribution. EI quantifies the mean improvement over the best measurement, balancing exploitation ( $\mu_n(x)$ large) and exploration ( $\sigma_n(x)$ large). In noiseless settings, observed points are not revisited ( $\mathrm{EI} \to 0$ ).

Knowledge-Gradient (KG):

When observations may be noisy or the final selection occurs from a list of candidate designs, KG selects the next experiment to maximize the expected improvement in the surrogate's best predicted mean after assimilation of the next observation:

$\mathrm{KG}_n(x) = \mathbb{E}_n \left[ \mu^*_{n+1} - \mu^*_n \mid x_{n+1}=x \right ]$

Here $\mu^*_n = \max_{x \in A_n}\mu_n(x)$ as evaluated on candidate set $A_n$ ; $\mu^*_{n+1}$ is the best estimate incorporating the new data. KG generalizes EI to noisy and more general design settings, and its computation typically involves closed-form or numerical integration over the normal predictive distribution.

Both EI and KG are derived from value-of-information analysis and are "one-step Bayes-optimal," meaning each selects the next query to maximize the immediate expected benefit in selecting the overall best design, given all current data.

4. Implementation in Materials Design

Bayesian optimization has been applied in materials discovery to optimize compositional ratios, process conditions (e.g., temperature, pressure), or other property-defining variables. The design problem takes the form of optimizing $f(x)$ where $x$ encodes material composition or process parameters, and $f$ expresses the target property (e.g., modulus, conductivity, yield).

Key implementation aspects:

Parameter definition: $x$ can represent normalized component ratios (subject to affine constraints, e.g., $\sum x_i=1$ ) or bounded process parameters.
Experiment guidance: The GP model is fit to prior results. At each step, the acquisition function (EI or KG) is maximized over the feasible design space, and the corresponding experiment is run. Posterior updates incorporate new data, cycling the sequential design loop.
Hyperparameter estimation: Rather than selecting surrogate kernel/mode parameters ad hoc, they are adaptively tuned via marginal likelihood maximization on the observed data at each loop iteration.

Substantially fewer experiments are required relative to grid/random search, accelerating convergence to optimal or near-optimal material systems.

5. Extensions to Broader Scientific Discovery

Bayesian optimization techniques generalize to a diverse array of scientific disciplines wherever each experiment is expensive and data are scarce. Key points of generalization include:

Surrogate modeling: As long as the response function can be statistically described via a continuous surrogate (e.g., GP), the methodology applies across chemistry, biology, physics, hyperparameter optimization for machine learning, or chemical process control.
Exploration–exploitation trade-off: The balance between sampling uncertain regions and exploiting current optima, as operationalized by the acquisition function, is fundamental regardless of the domain.
Custom kernels: The ability to select or design kernels in the GP allows domain-specific encoding of smoothness, periodicity, or other prior assumptions about the process.
Extension to multistep/sequential design: While EI and KG are one-step Bayes-optimal, the underlying value-of-information ideas underpin extensions to more complex, multi-step or multi-objective design problems.

6. Summary and Methodological Implications

Bayesian optimization, underpinned by GP regression and one-step Bayes-optimal acquisition functions, provides a highly adaptive, information-theoretically motivated framework for sequential experiment planning in scientific discovery. It is able to accelerate the identification of optimal (or near-optimal) designs under strict experimental budgets by updating beliefs after each observation and guiding subsequent sampling to maximize expected knowledge gain or improvement.

This methodology not only optimizes sample efficiency in experimental sciences such as materials discovery but also constitutes a generic and extensible approach for black-box optimization problems where each evaluation is costly and uncertainty must be systematically managed.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Bayesian Optimization for Scientific Discovery.