ParEGO Algorithm for Multi-Objective Optimization

Updated 23 June 2026

ParEGO is a scalarization-based Bayesian optimization algorithm that transforms multi-objective problems into a series of single-objective steps to approximate the Pareto front.
It employs random weight scalarizations, Gaussian Process modeling, and Expected Improvement to achieve data-efficient searches under tight evaluation budgets.
Interactive extensions of ParEGO, such as TRIPE and WAPE, incorporate decision-maker feedback to focus the search on preferred regions and enhance convergence.

ParEGO (Pareto Efficient Global Optimization) is a scalarization-based Bayesian optimization algorithm for multi-objective black-box optimization problems under limited evaluation budgets. It transforms multi-objective optimization—where the aim is to approximate the Pareto front of non-dominated solutions—into a series of single-objective optimization steps via random scalarizations, managing resource constraints while enabling data-efficient search for optimal trade-offs. ParEGO has influenced both theory and practical methodology in interactive multi-objective optimization, and its extensions address the challenge of eliciting and exploiting decision-maker preferences in expensive regimes (Heidari et al., 2024, Ungredda et al., 2021, Mamun et al., 2024).

1. Multi-Objective Problem Formulation

Given a decision space $X \subset \mathbb{R}^n$ and $m$ objective functions $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ , the Pareto set $P^* \subset X$ comprises solutions not dominated by any other point under the partial order:

$x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$

The Pareto front is $F(P^*)$ in objective space. In practical scenarios, exhaustive approximation of $P^*$ is infeasible due to evaluation expense. Instead, sampled non-dominated points are iteratively presented to a decision-maker (DM), who selects solutions aligning with their (typically hidden) preferences (Heidari et al., 2024).

2. ParEGO Core Methodology

ParEGO operates by scalarizing multiple objectives with randomly sampled weight vectors, fitting a GP surrogate on the resulting scalarized values, and directing Bayesian optimization via acquisition functions:

Scalarization: Augmented Weighted Tchebycheff Function

For a random weight $w \in \Delta^m$ , the standard $m$ -simplex, and reference point $z^* = (z_1^*, ..., z_m^*)$ (often $m$ 0 over observed data), define:

$m$ 1

where $m$ 2 (e.g., 0.01–0.05) avoids ties and enforces strict differentiability. Minimizing $m$ 3 for varying $m$ 4 yields Pareto-optimal solutions across the front (Ungredda et al., 2021, Heidari et al., 2024, Mamun et al., 2024).

Surrogate Modeling: Gaussian Processes

A GP surrogate $m$ 5 is fit to the scalarized objective values. The canonical kernel is the squared-exponential (SE) or Matérn 5/2, sometimes with an additive linear kernel:

$m$ 6

Hyperparameters are estimated by maximizing GP marginal likelihood.

Acquisition: Expected Improvement

For data $m$ 7, compute EI on $m$ 8:

$m$ 9

where $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 0 is the best observed scalarized value, and $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 1, $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 2 are standard normal CDF and PDF (Ungredda et al., 2021).

Algorithmic Workflow

Step	Description
Initialization	Generate initial $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 3 samples (e.g., space-filling design)
Scalarization	Draw random $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 4; compute $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 5 for all $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 6
GP Fit	Fit GP on $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 7
Acquisition	Maximize EI over $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 8; select $f(x) = (f_1(x), ..., f_m(x)),\ x \in X$ 9
Evaluation	Evaluate $P^* \subset X$ 0; augment data and repeat

Final output is the set of non-dominated solutions in $P^* \subset X$ 1.

3. Interactive and Data-Efficient Extensions

Elicitation of Preferences

Conventional ParEGO explores the entire Pareto front, often oversampling regions irrelevant to the DM. To address this, interactive extensions incorporate DM feedback, focusing search on the locally preferred region after the DM selects a "favorite" trade-off point (Heidari et al., 2024).

TRIPE: Triangulation-Based Region Exploration

TRIPE restricts exploration to the neighborhood of the DM's selected design in $P^* \subset X$ 2:

Construct Delaunay triangulation on evaluated points.
Identify simplices containing the current preferred point.
Generate candidates as simplex centroids and convex-hull facet centers adjacent to the selection.
Evaluate at these neighbors, updating the front and eliciting new DM preferences.

TRIPE is hyperparameter-free but scales poorly ( $P^* \subset X$ 3) beyond $P^* \subset X$ 4 due to triangulation complexity (Heidari et al., 2024).

WAPE: Weight-Adjustment-Based Exploration

Rather than limiting to local neighborhoods in $P^* \subset X$ 5, WAPE samples new weights near the DM's selected $P^* \subset X$ 6. Specifically:

Perturb $P^* \subset X$ 7 multiplicatively by $P^* \subset X$ 8 for spread parameter $P^* \subset X$ 9.
Normalize new $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 0 to the simplex: $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 1.
For each $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 2, run the standard ParEGO GP/EI step and evaluate new points, with subsequent DM feedback.
WAPE efficiently drives convergence toward the preferred region, outpacing both baseline and TRIPE in empirical tests (Heidari et al., 2024).

4. Algorithmic Variants: Batched and Noisy ParEGO

Parallel (Batch) ParEGO

parEGO generalizes to batch selection ( $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 3) by sequentially applying the acquisition $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 4, conditioning on points chosen in the current batch. Each iteration involves fitting GPs, sampling weights, computing the current-best scalarization, and selecting $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 5 maximizers iteratively (Mamun et al., 2024).

Noisy ParEGO (qNparEGO)

In the presence of noisy observations, qNparEGO integrates over GP posterior uncertainty when defining the incumbent best, yielding an acquisition function that averages improvement over both $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 6 and the Pareto frontier. Monte Carlo sampling is employed for both the predictive distribution at test points and the uncertain frontier, at increased computational cost (Mamun et al., 2024).

5. Implementation and Computational Considerations

Surrogate Model Details

GP surrogates are independently fit per objective or jointly on scalarized data.
Kernel selection typically includes SE or Matérn 5/2 plus a linear term.
Hyperparameters (length scales, signal variance, noise variance) are optimized by maximum marginal likelihood.

Computational Complexity

Each GP fit incurs $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 7 cost for $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 8 data points per iteration.
Delaunay triangulation in TRIPE has exponential complexity in $x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).$ 9, limiting its use to low-dimensions ( $F(P^*)$ 0).
WAPE overhead grows only linearly in $F(P^*)$ 1 and $F(P^*)$ 2 per iteration.
Batched and noisy variants entail costly MC integration, especially for high acquisition function accuracy.

Optimization Details

EI maximization is generally done by multi-start L-BFGS, grid search, or CMA-ES.
Weight vectors for scalarization are typically drawn uniformly from the simplex (e.g., via Dirichlet(1) sampling).

Common Implementation Choices

Parameter	Typical Value / Method
Initial design	$F(P^)$ 3– $F(P^)$ 4 (Latin hypercube, Halton)
Scalarization	$F(P^)$ 5– $F(P^)$ 6
Number of weights per iteration	$F(P^)$ 7 (standard), $F(P^)$ 8 in WAPE
Kernel	SE or Matérn 5/2 $F(P^*)$ 9 linear
EI optimizer	20 L-BFGS restarts, CMA-ES

6. Empirical Performance and Comparative Analysis

Extensive benchmarks demonstrate key properties of ParEGO and its extensions:

On DTLZ2-type problems with $P^*$ 0 objectives and $P^*$ 1, WAPE exhibited the fastest convergence in opportunity cost ( $P^*$ 2), reliably recovering the DM’s true optimum (Heidari et al., 2024).
TRIPE outperformed the interactive ParEGO baseline for $P^*$ 3, but its efficiency collapsed at higher $P^*$ 4 due to triangulation overhead.
Standard ParEGO (without DM feedback) wastes most of the evaluation budget on globally exploring the front, leading to slower convergence of opportunity cost and wider dispersion in attained solutions.

In discrete multi-component alloy design, noisy and batched ParEGO (qNparEGO, qparEGO) attained $P^*$ 5– $P^*$ 6 of maximal hypervolume after $P^*$ 7– $P^*$ 8 iterations, lagging behind hypervolume-based methods such as qEHVI/qNEHVI, which routinely surpassed $P^*$ 9– $w \in \Delta^m$ 0 in fewer steps. The bias inherent in weighted Chebyshev scalarization causes ParEGO to sample the Pareto set unevenly, sometimes neglecting wide swathes of the front (Mamun et al., 2024).

7. Strengths, Limitations, and Application Context

ParEGO’s chief advantages are algorithmic simplicity, ease of extension to parallel and noisy settings, and flexibility across different surrogate models. It is highly competitive in scenarios where high evaluation cost or low noise predominate and where covering the entire Pareto front is less critical than converging to the DM’s preferred solution. Its data-efficient interactive variants (WAPE and TRIPE) are especially effective when the DM's preferences drive the optimization loop, as is common in engineering design (Heidari et al., 2024).

Limitations include an inherent scalarization bias, inefficiency in high-dimensional design spaces or fronts, and compromised performance compared to hypervolume-based BO methods for comprehensive Pareto front discovery (Mamun et al., 2024). Noisy and batch variants close some efficiency gaps at substantial compute cost.

ParEGO and its variants remain foundational within the Bayesian multi-objective optimization literature, particularly for interactive, budget-constrained applications demanding efficient navigation of black-box trade-offs.

Markdown Report Issue Upgrade to Chat

References (3)

Data-Efficient Interactive Multi-Objective Optimization Using ParEGO (2024)

One Step Preference Elicitation in Multi-Objective Bayesian Optimization (2021)

Accelerated Development of Multicomponent Alloys in Discrete Design Space Using Bayesian Multi-Objective Optimisation (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ParEGO Algorithm.