Papers
Topics
Authors
Recent
Search
2000 character limit reached

ParEGO Algorithm for Multi-Objective Optimization

Updated 23 June 2026
  • ParEGO is a scalarization-based Bayesian optimization algorithm that transforms multi-objective problems into a series of single-objective steps to approximate the Pareto front.
  • It employs random weight scalarizations, Gaussian Process modeling, and Expected Improvement to achieve data-efficient searches under tight evaluation budgets.
  • Interactive extensions of ParEGO, such as TRIPE and WAPE, incorporate decision-maker feedback to focus the search on preferred regions and enhance convergence.

ParEGO (Pareto Efficient Global Optimization) is a scalarization-based Bayesian optimization algorithm for multi-objective black-box optimization problems under limited evaluation budgets. It transforms multi-objective optimization—where the aim is to approximate the Pareto front of non-dominated solutions—into a series of single-objective optimization steps via random scalarizations, managing resource constraints while enabling data-efficient search for optimal trade-offs. ParEGO has influenced both theory and practical methodology in interactive multi-objective optimization, and its extensions address the challenge of eliciting and exploiting decision-maker preferences in expensive regimes (Heidari et al., 2024, Ungredda et al., 2021, Mamun et al., 2024).

1. Multi-Objective Problem Formulation

Given a decision space X⊂RnX \subset \mathbb{R}^n and mm objective functions f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X, the Pareto set P∗⊂XP^* \subset X comprises solutions not dominated by any other point under the partial order:

x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).

The Pareto front is F(P∗)F(P^*) in objective space. In practical scenarios, exhaustive approximation of P∗P^* is infeasible due to evaluation expense. Instead, sampled non-dominated points are iteratively presented to a decision-maker (DM), who selects solutions aligning with their (typically hidden) preferences (Heidari et al., 2024).

2. ParEGO Core Methodology

ParEGO operates by scalarizing multiple objectives with randomly sampled weight vectors, fitting a GP surrogate on the resulting scalarized values, and directing Bayesian optimization via acquisition functions:

Scalarization: Augmented Weighted Tchebycheff Function

For a random weight w∈Δmw \in \Delta^m, the standard mm-simplex, and reference point z∗=(z1∗,...,zm∗)z^* = (z_1^*, ..., z_m^*) (often mm0 over observed data), define:

mm1

where mm2 (e.g., 0.01–0.05) avoids ties and enforces strict differentiability. Minimizing mm3 for varying mm4 yields Pareto-optimal solutions across the front (Ungredda et al., 2021, Heidari et al., 2024, Mamun et al., 2024).

Surrogate Modeling: Gaussian Processes

A GP surrogate mm5 is fit to the scalarized objective values. The canonical kernel is the squared-exponential (SE) or Matérn 5/2, sometimes with an additive linear kernel:

mm6

Hyperparameters are estimated by maximizing GP marginal likelihood.

Acquisition: Expected Improvement

For data mm7, compute EI on mm8:

mm9

where f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X0 is the best observed scalarized value, and f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X1, f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X2 are standard normal CDF and PDF (Ungredda et al., 2021).

Algorithmic Workflow

Step Description
Initialization Generate initial f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X3 samples (e.g., space-filling design)
Scalarization Draw random f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X4; compute f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X5 for all f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X6
GP Fit Fit GP on f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X7
Acquisition Maximize EI over f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X8; select f(x)=(f1(x),...,fm(x)), x∈Xf(x) = (f_1(x), ..., f_m(x)),\ x \in X9
Evaluation Evaluate P∗⊂XP^* \subset X0; augment data and repeat

Final output is the set of non-dominated solutions in P∗⊂XP^* \subset X1.

3. Interactive and Data-Efficient Extensions

Elicitation of Preferences

Conventional ParEGO explores the entire Pareto front, often oversampling regions irrelevant to the DM. To address this, interactive extensions incorporate DM feedback, focusing search on the locally preferred region after the DM selects a "favorite" trade-off point (Heidari et al., 2024).

TRIPE: Triangulation-Based Region Exploration

TRIPE restricts exploration to the neighborhood of the DM's selected design in P∗⊂XP^* \subset X2:

  • Construct Delaunay triangulation on evaluated points.
  • Identify simplices containing the current preferred point.
  • Generate candidates as simplex centroids and convex-hull facet centers adjacent to the selection.
  • Evaluate at these neighbors, updating the front and eliciting new DM preferences.

TRIPE is hyperparameter-free but scales poorly (P∗⊂XP^* \subset X3) beyond P∗⊂XP^* \subset X4 due to triangulation complexity (Heidari et al., 2024).

WAPE: Weight-Adjustment-Based Exploration

Rather than limiting to local neighborhoods in P∗⊂XP^* \subset X5, WAPE samples new weights near the DM's selected P∗⊂XP^* \subset X6. Specifically:

  • Perturb P∗⊂XP^* \subset X7 multiplicatively by P∗⊂XP^* \subset X8 for spread parameter P∗⊂XP^* \subset X9.
  • Normalize new x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).0 to the simplex: x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).1.
  • For each x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).2, run the standard ParEGO GP/EI step and evaluate new points, with subsequent DM feedback.
  • WAPE efficiently drives convergence toward the preferred region, outpacing both baseline and TRIPE in empirical tests (Heidari et al., 2024).

4. Algorithmic Variants: Batched and Noisy ParEGO

Parallel (Batch) ParEGO

parEGO generalizes to batch selection (x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).3) by sequentially applying the acquisition x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).4, conditioning on points chosen in the current batch. Each iteration involves fitting GPs, sampling weights, computing the current-best scalarization, and selecting x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).5 maximizers iteratively (Mamun et al., 2024).

Noisy ParEGO (qNparEGO)

In the presence of noisy observations, qNparEGO integrates over GP posterior uncertainty when defining the incumbent best, yielding an acquisition function that averages improvement over both x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).6 and the Pareto frontier. Monte Carlo sampling is employed for both the predictive distribution at test points and the uncertain frontier, at increased computational cost (Mamun et al., 2024).

5. Implementation and Computational Considerations

Surrogate Model Details

  • GP surrogates are independently fit per objective or jointly on scalarized data.
  • Kernel selection typically includes SE or Matérn 5/2 plus a linear term.
  • Hyperparameters (length scales, signal variance, noise variance) are optimized by maximum marginal likelihood.

Computational Complexity

  • Each GP fit incurs x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).7 cost for x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).8 data points per iteration.
  • Delaunay triangulation in TRIPE has exponential complexity in x1≺x2  ⟺  ∀i:fi(x1)≤fi(x2) and ∃j:fj(x1)<fj(x2).x^1 \prec x^2 \iff \forall i: f_i(x^1) \leq f_i(x^2)\ \text{and}\ \exists j: f_j(x^1) < f_j(x^2).9, limiting its use to low-dimensions (F(P∗)F(P^*)0).
  • WAPE overhead grows only linearly in F(P∗)F(P^*)1 and F(P∗)F(P^*)2 per iteration.
  • Batched and noisy variants entail costly MC integration, especially for high acquisition function accuracy.

Optimization Details

  • EI maximization is generally done by multi-start L-BFGS, grid search, or CMA-ES.
  • Weight vectors for scalarization are typically drawn uniformly from the simplex (e.g., via Dirichlet(1) sampling).

Common Implementation Choices

Parameter Typical Value / Method
Initial design F(P∗)F(P^*)3–F(P∗)F(P^*)4 (Latin hypercube, Halton)
Scalarization F(P∗)F(P^*)5–F(P∗)F(P^*)6
Number of weights per iteration F(P∗)F(P^*)7 (standard), F(P∗)F(P^*)8 in WAPE
Kernel SE or Matérn 5/2 F(P∗)F(P^*)9 linear
EI optimizer 20 L-BFGS restarts, CMA-ES

6. Empirical Performance and Comparative Analysis

Extensive benchmarks demonstrate key properties of ParEGO and its extensions:

  • On DTLZ2-type problems with P∗P^*0 objectives and P∗P^*1, WAPE exhibited the fastest convergence in opportunity cost (P∗P^*2), reliably recovering the DM’s true optimum (Heidari et al., 2024).
  • TRIPE outperformed the interactive ParEGO baseline for P∗P^*3, but its efficiency collapsed at higher P∗P^*4 due to triangulation overhead.
  • Standard ParEGO (without DM feedback) wastes most of the evaluation budget on globally exploring the front, leading to slower convergence of opportunity cost and wider dispersion in attained solutions.

In discrete multi-component alloy design, noisy and batched ParEGO (qNparEGO, qparEGO) attained P∗P^*5–P∗P^*6 of maximal hypervolume after P∗P^*7–P∗P^*8 iterations, lagging behind hypervolume-based methods such as qEHVI/qNEHVI, which routinely surpassed P∗P^*9–w∈Δmw \in \Delta^m0 in fewer steps. The bias inherent in weighted Chebyshev scalarization causes ParEGO to sample the Pareto set unevenly, sometimes neglecting wide swathes of the front (Mamun et al., 2024).

7. Strengths, Limitations, and Application Context

ParEGO’s chief advantages are algorithmic simplicity, ease of extension to parallel and noisy settings, and flexibility across different surrogate models. It is highly competitive in scenarios where high evaluation cost or low noise predominate and where covering the entire Pareto front is less critical than converging to the DM’s preferred solution. Its data-efficient interactive variants (WAPE and TRIPE) are especially effective when the DM's preferences drive the optimization loop, as is common in engineering design (Heidari et al., 2024).

Limitations include an inherent scalarization bias, inefficiency in high-dimensional design spaces or fronts, and compromised performance compared to hypervolume-based BO methods for comprehensive Pareto front discovery (Mamun et al., 2024). Noisy and batch variants close some efficiency gaps at substantial compute cost.

ParEGO and its variants remain foundational within the Bayesian multi-objective optimization literature, particularly for interactive, budget-constrained applications demanding efficient navigation of black-box trade-offs.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ParEGO Algorithm.