Papers
Topics
Authors
Recent
Search
2000 character limit reached

Parallel Expected Improvement (q-EI)

Updated 2 June 2026
  • Parallel Expected Improvement (q-EI) is a Bayesian optimization criterion that selects a batch of candidate points for parallel evaluation of expensive black-box objectives using a Gaussian process surrogate.
  • It integrates dependencies among batch points through multivariate normal distributions and employs analytic or stochastic gradient estimators for efficient optimization.
  • Log-transform reformulations and Monte Carlo methods enhance numerical stability and scalability, particularly in high-dimensional settings and large batch scenarios.

Parallel Expected Improvement (q-EI) is a Bayesian optimization acquisition criterion for selecting a batch of qq input locations to evaluate expensive black-box objectives in parallel, under a Gaussian process (GP) surrogate model. It quantifies the expected gain, relative to the current incumbent, from simultaneously evaluating several candidate points, thereby enabling efficient use of parallel computational resources in global optimization. Unlike sequential Expected Improvement (EI), the q-EI criterion internalizes dependencies and interaction effects among all batch points. Mathematical, computational, and numerical properties of q-EI have been extensively developed, with significant focus on analytic expansions, stochastic gradient estimators, and numerically robust reformulations for high-dimensional and large-batch regimes (Marmin et al., 2015, Wang et al., 2016, Ning et al., 2020, Ament et al., 2023).

1. Formal Definition and Analytic Structure

Given a set of nn completed evaluations {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n and the best observed value f=miniyif^* = \min_{i} y_i (for minimization), the batch q-EI for new points X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)}) is defined as: EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr] where the joint predictive distribution under the GP posterior is multivariate normal,

f(X)Nq(μ(X),Σ(X))f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)

with μ(X)Rq\mu(X) \in \mathbb{R}^q and Σ(X)Rq×q\Sigma(X) \in \mathbb{R}^{q \times q}. This leads to an integral form: EIq(X)=Rq(fminizi)+  ϕq(z;μ(X),Σ(X))dz\mathrm{EI}_q(X) = \int_{\mathbb{R}^q} (f^* - \min_i z_i)_+ \;\phi_q(z; \mu(X), \Sigma(X)) dz where nn0 denotes the nn1-variate normal density.

For practical computation, this can be decomposed as a sum of nn2 lower-dimensional quadrature integrals: nn3 where nn4 is the univariate normal density, nn5 is the nn6-dimensional normal CDF evaluated at a shifted and scaled argument nn7, and each term corresponds to the event that the nn8th coordinate yields the minimum (Ning et al., 2020).

An alternative (maximization) form, as applied in (Marmin et al., 2015) and (Wang et al., 2016), conditions on the event that each batch point attains the maximum, resulting in analytic expansions via Tallis's formula and its extensions, where all terms are explicit functions of GP posterior moments and multivariate normal probabilities.

2. Computational Complexity and Scaling Properties

The direct computation of q-EI involves:

  • Multivariate normal CDF evaluations (nn9 and {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n0) scaling as {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n1 per call using, e.g., Genz's adaptive integration.
  • Analytic gradients with respect to all {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n2 batch-location variables, incurring {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n3 CDF calls for each batch evaluation if all terms are kept exact (Marmin et al., 2015).
  • Curse of dimensionality: Both the number of batch variables ({(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n4) and input dimension ({(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n5) contribute multiplicatively to the size of the optimization landscape; practical exact evaluation is infeasible for {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n6 or {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n7 (Ning et al., 2020).

Finite-difference gradients require {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n8 CDF calls, making analytic or stochastic gradients necessary for efficiency (Marmin et al., 2015, Wang et al., 2016).

3. Gradient Estimation and Optimization Techniques

Closed-form analytic gradients have been derived for q-EI, leveraging Gaussian identities, conditional laws, and matrix algebra. For the analytic gradient, all derivatives of predictive means, covariances, univariate and multivariate normal functions are tractable (see explicit expressions in (Marmin et al., 2015)).

As {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n9 increases, Monte Carlo approaches become dominant:

  • Infinitesimal Perturbation Analysis (IPA) constructs an unbiased stochastic gradient estimator by differentiating through the batch maximum's realization, yielding

f=miniyif^* = \min_{i} y_i0

where f=miniyif^* = \min_{i} y_i1 is the sample-wise batch improvement and f=miniyif^* = \min_{i} y_i2 are i.i.d. draws from f=miniyif^* = \min_{i} y_i3 (Wang et al., 2016).

Importance-resampling (SIR) approaches reduce the burden by generating a candidate pool (via Sobol or Latin Hypercube sequences), weighting by classical EI, and drawing batch points in a way that heuristically targets high q-EI values, reducing per-stage CPU time to f=miniyif^* = \min_{i} y_i4 for f=miniyif^* = \min_{i} y_i5 candidates (Ning et al., 2020).

4. Numerical Pathologies and Robust Reformulations

Canonical q-EI suffers from vanishing acquisition values and gradients in moderate-to-large data regimes:

  • As BO proceeds, posterior means rarely exceed the incumbent, causing improvement terms to be numerically zero for most batch configurations.
  • Analytic q-EI gradients decay rapidly in floating-point arithmetic for negative scaled improvement, often dropping below machine precision, especially in high dimensions or large batches.
  • Monte Carlo max operations have zero gradient almost everywhere due to the non-differentiable nature of the maximum (Ament et al., 2023).

The LogEI-q reformulation addresses these pathologies:

  • Applies a log-transformation to EI (and its batch generalization) for numerical stability, replacing the hard positive-part and max operations with softplus and smooth maximums (e.g., f=miniyif^* = \min_{i} y_i6).
  • Ensures strong, non-vanishing gradients even in regions where canonical q-EI stalls.
  • Has the same or approximately identical optima as canonical q-EI by monotonicity of log, and smoothing errors are controllable by temperature parameters (f=miniyif^* = \min_{i} y_i7, f=miniyif^* = \min_{i} y_i8) (Ament et al., 2023).

5. Algorithmic Implementations and Empirical Performance

Closed-form q-EI maximization involves multistart BFGS/quasi-Newton in f=miniyif^* = \min_{i} y_i9 dimensions, with analytic gradients enabling tractable optimization for X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})0 (Marmin et al., 2015). For larger X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})1 or higher X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})2, stochastic gradient-based maximization or SIR-based batch assembly becomes essential:

  • MC-based stochastic optimizers (as in MOE) draw many MC samples to estimate q-EI and its gradient, using restarts and step-size schedules (Wang et al., 2016).
  • LogEI-q can be optimized directly using autodiff-enabled libraries with MC sampling, gradient smoothing, and robust handling of constraints and noise (Ament et al., 2023).
  • Accelerated EGO via SIR randomly samples well-spread candidate batches, weighs by one-point EI, and assembles diverse, high-EI batches with low clustering and competitive empirical regret, reducing computational cost and iteration count drastically for X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})3 up to 15 and X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})4 up to 12 (Ning et al., 2020).

Empirically:

  • Analytic q-EI outperforms heuristic BUCB in one-step and worst-case regret but is more computationally expensive; SIR and MC-based methods recover similar or better performance at a fraction of the wall-clock cost (Marmin et al., 2015, Ning et al., 2020).
  • LogEI-q achieves 2–5x lower regret than classic MC q-EI for X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})5 in high-dimensional settings and avoids zero-gradient stagnation (Ament et al., 2023).

6. Practical Considerations and Recommendations

  • For small X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})6 (X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})7), both analytic and MC approaches are practical, but IPA-based MC gradient ascent is easily parallelized and extensible (Wang et al., 2016).
  • For moderate to large X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})8 (X=(x(1),,x(q))X = (x^{(1)}, \dots, x^{(q)})9), MC-based or SIR-based batch construction, with robustified acquisition (LogEI-q), should be preferred.
  • Numerical stability is critical: Use log-transformed EI and smooth max/plus operators, tune temperature parameters to maintain gradient flow, and employ multiple restarts with space-filling initializations (Ament et al., 2023).
  • Implementation in open-source frameworks such as MOE supports scalable, GPU-accelerated MC q-EI (Wang et al., 2016).
  • For constrained, noisy, or multi-objective settings, the same log-smoothing principles extend to robust acquisition construction, avoiding degeneracy in derivative-free optimization (Ament et al., 2023).

7. Comparative Summary of Techniques

Method Batch Size/Dim Scale Complexity/Cost Empirical Robustness
Analytic q-EI EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]0, EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]1 EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]2/EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]3 grad High regret perf, costly
Stochastic Grad. MC q-EI EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]4 large, any EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]5 EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]6, scalable Similar regret, much faster
SIR (Accelerated EGO) EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]7 large, moderate EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]8 EIq(X)=E[max{0,fmini=1qf(x(i))}]\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]9 per stage Heuristic, near-optimal regret
LogEI-q (MC) Any f(X)Nq(μ(X),Σ(X))f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)0, high f(X)Nq(μ(X),Σ(X))f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)1 f(X)Nq(μ(X),Σ(X))f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)2, stable State-of-art, robust gradients

Analytic approaches are preferred for small f(X)Nq(μ(X),Σ(X))f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)3, while MC- and SIR-based strategies scale efficiently to large batch sizes and higher dimensions. LogEI-q is recommended for robust, numerically stable acquisition maximization, especially in high-dimensional or constrained batch selection settings (Marmin et al., 2015, Wang et al., 2016, Ning et al., 2020, Ament et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Parallel Expected Improvement (q-EI).