Parallel Expected Improvement (q-EI)

Updated 2 June 2026

Parallel Expected Improvement (q-EI) is a Bayesian optimization criterion that selects a batch of candidate points for parallel evaluation of expensive black-box objectives using a Gaussian process surrogate.
It integrates dependencies among batch points through multivariate normal distributions and employs analytic or stochastic gradient estimators for efficient optimization.
Log-transform reformulations and Monte Carlo methods enhance numerical stability and scalability, particularly in high-dimensional settings and large batch scenarios.

Parallel Expected Improvement (q-EI) is a Bayesian optimization acquisition criterion for selecting a batch of $q$ input locations to evaluate expensive black-box objectives in parallel, under a Gaussian process (GP) surrogate model. It quantifies the expected gain, relative to the current incumbent, from simultaneously evaluating several candidate points, thereby enabling efficient use of parallel computational resources in global optimization. Unlike sequential Expected Improvement (EI), the q-EI criterion internalizes dependencies and interaction effects among all batch points. Mathematical, computational, and numerical properties of q-EI have been extensively developed, with significant focus on analytic expansions, stochastic gradient estimators, and numerically robust reformulations for high-dimensional and large-batch regimes (Marmin et al., 2015, Wang et al., 2016, Ning et al., 2020, Ament et al., 2023).

1. Formal Definition and Analytic Structure

Given a set of $n$ completed evaluations $\{(x_i, y_i)\}_{i=1}^n$ and the best observed value $f^* = \min_{i} y_i$ (for minimization), the batch q-EI for new points $X = (x^{(1)}, \dots, x^{(q)})$ is defined as: $\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ where the joint predictive distribution under the GP posterior is multivariate normal,

$f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)$

with $\mu(X) \in \mathbb{R}^q$ and $\Sigma(X) \in \mathbb{R}^{q \times q}$ . This leads to an integral form: $\mathrm{EI}_q(X) = \int_{\mathbb{R}^q} (f^* - \min_i z_i)_+ \;\phi_q(z; \mu(X), \Sigma(X)) dz$ where $n$ 0 denotes the $n$ 1-variate normal density.

For practical computation, this can be decomposed as a sum of $n$ 2 lower-dimensional quadrature integrals: $n$ 3 where $n$ 4 is the univariate normal density, $n$ 5 is the $n$ 6-dimensional normal CDF evaluated at a shifted and scaled argument $n$ 7, and each term corresponds to the event that the $n$ 8th coordinate yields the minimum (Ning et al., 2020).

An alternative (maximization) form, as applied in (Marmin et al., 2015) and (Wang et al., 2016), conditions on the event that each batch point attains the maximum, resulting in analytic expansions via Tallis's formula and its extensions, where all terms are explicit functions of GP posterior moments and multivariate normal probabilities.

2. Computational Complexity and Scaling Properties

The direct computation of q-EI involves:

Multivariate normal CDF evaluations ( $n$ 9 and $\{(x_i, y_i)\}_{i=1}^n$ 0) scaling as $\{(x_i, y_i)\}_{i=1}^n$ 1 per call using, e.g., Genz's adaptive integration.
Analytic gradients with respect to all $\{(x_i, y_i)\}_{i=1}^n$ 2 batch-location variables, incurring $\{(x_i, y_i)\}_{i=1}^n$ 3 CDF calls for each batch evaluation if all terms are kept exact (Marmin et al., 2015).
Curse of dimensionality: Both the number of batch variables ( $\{(x_i, y_i)\}_{i=1}^n$ 4) and input dimension ( $\{(x_i, y_i)\}_{i=1}^n$ 5) contribute multiplicatively to the size of the optimization landscape; practical exact evaluation is infeasible for $\{(x_i, y_i)\}_{i=1}^n$ 6 or $\{(x_i, y_i)\}_{i=1}^n$ 7 (Ning et al., 2020).

Finite-difference gradients require $\{(x_i, y_i)\}_{i=1}^n$ 8 CDF calls, making analytic or stochastic gradients necessary for efficiency (Marmin et al., 2015, Wang et al., 2016).

3. Gradient Estimation and Optimization Techniques

Closed-form analytic gradients have been derived for q-EI, leveraging Gaussian identities, conditional laws, and matrix algebra. For the analytic gradient, all derivatives of predictive means, covariances, univariate and multivariate normal functions are tractable (see explicit expressions in (Marmin et al., 2015)).

As $\{(x_i, y_i)\}_{i=1}^n$ 9 increases, Monte Carlo approaches become dominant:

Infinitesimal Perturbation Analysis (IPA) constructs an unbiased stochastic gradient estimator by differentiating through the batch maximum's realization, yielding

$f^* = \min_{i} y_i$ 0

where $f^* = \min_{i} y_i$ 1 is the sample-wise batch improvement and $f^* = \min_{i} y_i$ 2 are i.i.d. draws from $f^* = \min_{i} y_i$ 3 (Wang et al., 2016).

Projected stochastic gradient ascent (SGA) with Polyak–Ruppert averaging is used for local maximization over candidate batches, converging under standard stochastic approximation conditions (Wang et al., 2016).

Importance-resampling (SIR) approaches reduce the burden by generating a candidate pool (via Sobol or Latin Hypercube sequences), weighting by classical EI, and drawing batch points in a way that heuristically targets high q-EI values, reducing per-stage CPU time to $f^* = \min_{i} y_i$ 4 for $f^* = \min_{i} y_i$ 5 candidates (Ning et al., 2020).

4. Numerical Pathologies and Robust Reformulations

Canonical q-EI suffers from vanishing acquisition values and gradients in moderate-to-large data regimes:

As BO proceeds, posterior means rarely exceed the incumbent, causing improvement terms to be numerically zero for most batch configurations.
Analytic q-EI gradients decay rapidly in floating-point arithmetic for negative scaled improvement, often dropping below machine precision, especially in high dimensions or large batches.
Monte Carlo max operations have zero gradient almost everywhere due to the non-differentiable nature of the maximum (Ament et al., 2023).

The LogEI-q reformulation addresses these pathologies:

Applies a log-transformation to EI (and its batch generalization) for numerical stability, replacing the hard positive-part and max operations with softplus and smooth maximums (e.g., $f^* = \min_{i} y_i$ 6).
Ensures strong, non-vanishing gradients even in regions where canonical q-EI stalls.
Has the same or approximately identical optima as canonical q-EI by monotonicity of log, and smoothing errors are controllable by temperature parameters ( $f^* = \min_{i} y_i$ 7, $f^* = \min_{i} y_i$ 8) (Ament et al., 2023).

5. Algorithmic Implementations and Empirical Performance

Closed-form q-EI maximization involves multistart BFGS/quasi-Newton in $f^* = \min_{i} y_i$ 9 dimensions, with analytic gradients enabling tractable optimization for $X = (x^{(1)}, \dots, x^{(q)})$ 0 (Marmin et al., 2015). For larger $X = (x^{(1)}, \dots, x^{(q)})$ 1 or higher $X = (x^{(1)}, \dots, x^{(q)})$ 2, stochastic gradient-based maximization or SIR-based batch assembly becomes essential:

MC-based stochastic optimizers (as in MOE) draw many MC samples to estimate q-EI and its gradient, using restarts and step-size schedules (Wang et al., 2016).
LogEI-q can be optimized directly using autodiff-enabled libraries with MC sampling, gradient smoothing, and robust handling of constraints and noise (Ament et al., 2023).
Accelerated EGO via SIR randomly samples well-spread candidate batches, weighs by one-point EI, and assembles diverse, high-EI batches with low clustering and competitive empirical regret, reducing computational cost and iteration count drastically for $X = (x^{(1)}, \dots, x^{(q)})$ 3 up to 15 and $X = (x^{(1)}, \dots, x^{(q)})$ 4 up to 12 (Ning et al., 2020).

Empirically:

Analytic q-EI outperforms heuristic BUCB in one-step and worst-case regret but is more computationally expensive; SIR and MC-based methods recover similar or better performance at a fraction of the wall-clock cost (Marmin et al., 2015, Ning et al., 2020).
LogEI-q achieves 2–5x lower regret than classic MC q-EI for $X = (x^{(1)}, \dots, x^{(q)})$ 5 in high-dimensional settings and avoids zero-gradient stagnation (Ament et al., 2023).

6. Practical Considerations and Recommendations

For small $X = (x^{(1)}, \dots, x^{(q)})$ 6 ( $X = (x^{(1)}, \dots, x^{(q)})$ 7), both analytic and MC approaches are practical, but IPA-based MC gradient ascent is easily parallelized and extensible (Wang et al., 2016).
For moderate to large $X = (x^{(1)}, \dots, x^{(q)})$ 8 ( $X = (x^{(1)}, \dots, x^{(q)})$ 9), MC-based or SIR-based batch construction, with robustified acquisition (LogEI-q), should be preferred.
Numerical stability is critical: Use log-transformed EI and smooth max/plus operators, tune temperature parameters to maintain gradient flow, and employ multiple restarts with space-filling initializations (Ament et al., 2023).
Implementation in open-source frameworks such as MOE supports scalable, GPU-accelerated MC q-EI (Wang et al., 2016).
For constrained, noisy, or multi-objective settings, the same log-smoothing principles extend to robust acquisition construction, avoiding degeneracy in derivative-free optimization (Ament et al., 2023).

7. Comparative Summary of Techniques

Method	Batch Size/Dim Scale	Complexity/Cost	Empirical Robustness
Analytic q-EI	$\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 0, $\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 1	$\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 2/ $\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 3 grad	High regret perf, costly
Stochastic Grad. MC q-EI	$\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 4 large, any $\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 5	$\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 6, scalable	Similar regret, much faster
SIR (Accelerated EGO)	$\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 7 large, moderate $\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 8	$\mathrm{EI}_q(X) = \mathbb{E}\bigl[ \max\{0, f^* - \min_{i=1}^q f(x^{(i)}) \} \bigr]$ 9 per stage	Heuristic, near-optimal regret
LogEI-q (MC)	Any $f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)$ 0, high $f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)$ 1	$f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)$ 2, stable	State-of-art, robust gradients

Analytic approaches are preferred for small $f(X) \sim \mathcal{N}_q\left( \mu(X), \Sigma(X) \right)$ 3, while MC- and SIR-based strategies scale efficiently to large batch sizes and higher dimensions. LogEI-q is recommended for robust, numerically stable acquisition maximization, especially in high-dimensional or constrained batch selection settings (Marmin et al., 2015, Wang et al., 2016, Ning et al., 2020, Ament et al., 2023).

Markdown Report Issue Upgrade to Chat

References (4)

Differentiating the multipoint Expected Improvement for optimal batch design (2015)

Parallel Bayesian Global Optimization of Expensive Functions (2016)

Batch Sequential Adaptive Designs for Global Optimization (2020)

Unexpected Improvements to Expected Improvement for Bayesian Optimization (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Parallel Expected Improvement (q-EI).

Parallel Expected Improvement (q-EI)

1. Formal Definition and Analytic Structure

2. Computational Complexity and Scaling Properties

3. Gradient Estimation and Optimization Techniques

4. Numerical Pathologies and Robust Reformulations

5. Algorithmic Implementations and Empirical Performance

6. Practical Considerations and Recommendations

7. Comparative Summary of Techniques

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Parallel Expected Improvement (q-EI)

1. Formal Definition and Analytic Structure

2. Computational Complexity and Scaling Properties

3. Gradient Estimation and Optimization Techniques

4. Numerical Pathologies and Robust Reformulations

5. Algorithmic Implementations and Empirical Performance

6. Practical Considerations and Recommendations

7. Comparative Summary of Techniques

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research