Global Optimization Approach

Updated 11 November 2025

Global optimization approach is a method to identify the global extremum over multimodal, high-dimensional landscapes using uncertainty-aware models.
The strategy employs Bayesian surrogates and an entropy-based acquisition function to quantify information gain under noisy evaluations.
A virtual batch mechanism stabilizes point selection by reducing estimator variance, proving effective in engineering applications like renewable energy integration.

A global optimization approach encompasses the foundations, algorithms, and implementation principles for identifying the global extremum (minimum or maximum) of a mathematical function—typically under weak regularity assumptions and with little a priori knowledge of the function's structure. Unlike local optimization, which locates a stationary point in a small neighborhood, global optimization aims to find the globally best solution over a potentially high-dimensional, multimodal, or black-box landscape. Approaches span deterministic methods with rigorous guarantees, metaheuristics inspired by natural processes, probabilistic and Bayesian sequential strategies, and advanced hybridizations adapted to various black-box or noisy-evaluation settings.

1. Probabilistic and Information-Theoretic Foundations

A major strand of global optimization is rooted in statistical inference, prominently Bayesian sequential search under uncertainty. When the function $f:X\to\mathbb{R}$ is expensive to evaluate—e.g., costly simulations or experiments—Bayesian surrogates such as Gaussian processes (GPs) are employed to model $f$ and quantify uncertainty after $n$ observations $\mathcal{D}_n = \{(X_i, y_i)\}_{i=1}^n$ . Under a GP prior:

The posterior at $x$ is $\mathcal{N}(\mu_n(x), s_n^2(x))$ , with formulas for mean and variance derived from Kriging regression.
Evaluation noise is modeled as $y_i = f(X_i) + \epsilon_i$ , $\epsilon_i \sim \mathcal{N}(0, \sigma_\epsilon^2)$ , with potentially large noise variance.

Rather than maximizing a simple acquisition function (as in expected improvement), the informational approach targets maximum expected information gain regarding the location of the global optimizer:

For a discrete candidate set $X = \{x_1,\dots,x_m\}$ , define the random variable $X^* = \operatorname{argmin}_{x\in X} f(x)$ .
The Shannon entropy is $H[X^*] = -\sum_{i=1}^m P_n(X^* = x_i) \log P_n(X^* = x_i)$ .
The acquisition function is the expected reduction in minimizer entropy due to a new evaluation at $x$ :

$\Delta(x) = H[X^*] - \mathbb{E}_{y\sim p_n(y|x)}[ H[X^* \mid \mathcal{D}_n \cup \{(x, y)\} ] ]$

Numerically, $\Delta(x)$ is approximated via quadrature on the GP-predicted $y$ distribution and conditional GP simulations to compute the post-evaluation minimizer distributions.

2. Algorithmic Framework: Virtual Batch Stabilization

With very noisy evaluations (large $\sigma_\epsilon^2$ ), the entropy-reduction signal per evaluation is small and the variance of its Monte Carlo estimate can overwhelm the true objective signal. To address this, the input paper introduces a virtual batch mechanism:

For next-point selection, imagine $K$ independent future evaluations at $x$ , so only the average $\bar{y}$ matters, distributed as $\mathcal{N}(\mu_n(x), s_n^2(x) + \sigma_\epsilon^2 / K)$ .
The criterion becomes:

$\Delta_K(x) = H[X^*] - \int H[X^* \mid \mathcal{D}_n \cup \{(x, \bar{y})\}]\, \mathcal{N}(\bar{y}; \mu_n(x), s_n^2(x) + \sigma_\epsilon^2/K) d\bar{y}$

As $K \to \infty$ , the virtual observation variance shrinks, leading to a more stable entropy-reduction estimate, even if only a single real evaluation is performed at the chosen $x$ .

This stabilization is essential for robust performance under heavy noise and makes the sequential decision process less susceptible to Monte Carlo estimator randomness.

The sequential algorithm is as follows:

Initialize with $n_0$ design points.
Repeat:
- Fit GP to data, compute posterior mean/variance.
- For each candidate $x$ , discretize the posterior for $\bar{y}$ , simulate conditional GP sample paths, and form entropy estimates.
- Select $x_{n+1} = \arg\min_x J'_n(x)$ (i.e., maximizes $\Delta_K$ ).
- Perform $K_0$ real evaluations at $x_{n+1}$ and augment data.
- Increment $n$ and repeat until the experimental or computation budget is exhausted.

3. Noise Regimes, Estimator Variance, and Trade-offs

With small or moderate evaluation noise, the entropy-based information gain per evaluation is significant and single-evaluation-based selection is effective. However, in high-noise regimes:

Empirical estimation of $J_n(x)$ (expected entropy after a new evaluation) suffers high variance $(\sim 1/M)$ due to limited Monte Carlo sample size $M$ .
The virtual batch approach artificially sharpens the response of the acquisition function to candidate $x$ , better discriminating between choices.

There is a trade-off:

Larger $K$ gives more stable point selection but may underemphasize the single-sample noise; $K$ should be large enough that the selection step becomes dominated by global structure rather than estimator variance.
In practice, the method is robust even when only $K=K_0=10$ evaluations are performed per step but using selection with large virtual $K$ (e.g., $K=\infty$ ) yields better results.

4. Application in Engineering: Renewable Energy Integration

The approach was quantitatively validated on a renewable energy integration problem:

Design parameter $x\in[-1, 0]$ reflects strategies for ten-year integration of renewables by a Distribution System Operator.
The true function is $f(x) = \mathbb{E}_S[C(x, S)]$ for scenario $S$ , with each simulation run yielding one cost observation $y_i = C(X_i, S_i)$ .
With a simulation budget of 2,000 (on a grid of 51 $x$ $x$ ), and batch size $K_0=10$ $K_{0} = 10$ , several strategies were compared:
- IID random sampling,
- Original IAGO ( $K=10$ ),
- IAGO with infinite $K$ (virtual batch approach).

Numerical results over 500 runs show that IAGO with $K=\infty$ more rapidly reduces both minimizer entropy and localization error. Even after all 2,000 runs, significant epistemic uncertainty remains, but the virtual-batch approach consistently outperforms both the original information-based and the IID sampling strategy.

Empirical observation: Artificially inflating the batch size in the selection phase is effectively a variance reduction technique for the acquisition function, critical for robust decision-making under highly noisy measurements.

5. Implementation and Computational Considerations

Resource requirements are governed by:

The number of candidate points evaluated per selection step (typically a dense grid),
The Monte Carlo sample size $M$ for conditional simulations per quadrature point,
The cost of recomputing GP conditionals—tractable for moderate discretizations and $M$ but demanding for high-dimensional input.

The method scales well for low/moderate-dimensional problems with expensive function evaluations, where the evaluation budget is limited and each decision's information yield must be maximized.

Limitations include:

The approach is less tractable in continuous, high-dimensional domains without efficient surrogate models.
The method is most appropriate when evaluation noise is high and standard Bayesian optimization approaches such as Expected Improvement are unreliable due to estimator variance.

The informational approach described here is a direct extension of entropy reduction methods for sequential experiment design, distinct from classical acquisition strategies. It provides a consistent Bayesian framework for global optimization and is particularly suited for robust optimization and robust design under uncertainty. The method contrasts with purely heuristic or metaheuristic global optimization, which generally lack explicit uncertainty quantification or rigorous information-theoretic prioritization of evaluation points.

The technique is also closely related to other GP-based optimization with acquisition functions adapted for noise, but introduces an entropy-centric perspective that is more directly aligned with the learning goal of minimizer localization, rather than pointwise improvement.

7. Summary Table: Key Components

Component	Description	Notable Formula/Output
Surrogate Model	Gaussian process prior/posterior with known noise	$\mu_n(x), s^2_n(x)$
Acquisition Function	Expected reduction in minimizer entropy (information gain)	$\Delta_K(x) = H[X^] - \int H[X^ \| ...] p(\bar y \| x) d\bar y$
Virtual Batch Trick	Artificially “evaluate” as if $K$ noisy samples per candidate	Reduces estimator variance, stabilizes selection
Sequential Algorithm	Batch sample, fit GP, optimize $\Delta_K$ , update data	Steps 1–6 as detailed above
Primary Application	High-noise, expensive simulation settings (e.g., engineering design)	Renewable energy integration test case

In summary, the global optimization approach described here leverages Bayesian Gaussian process surrogates and entropy-minimization acquisition functions, augmented by a virtual batch strategy that is essential for stability under very noisy observation regimes. This approach provides a rigorous, informative, and robust sequential decision-making protocol for optimizing expensive, stochastic systems.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Global Optimization Approach.