ParEGO Algorithm for Multi-Objective Optimization
- ParEGO is a scalarization-based Bayesian optimization algorithm that transforms multi-objective problems into a series of single-objective steps to approximate the Pareto front.
- It employs random weight scalarizations, Gaussian Process modeling, and Expected Improvement to achieve data-efficient searches under tight evaluation budgets.
- Interactive extensions of ParEGO, such as TRIPE and WAPE, incorporate decision-maker feedback to focus the search on preferred regions and enhance convergence.
ParEGO (Pareto Efficient Global Optimization) is a scalarization-based Bayesian optimization algorithm for multi-objective black-box optimization problems under limited evaluation budgets. It transforms multi-objective optimization—where the aim is to approximate the Pareto front of non-dominated solutions—into a series of single-objective optimization steps via random scalarizations, managing resource constraints while enabling data-efficient search for optimal trade-offs. ParEGO has influenced both theory and practical methodology in interactive multi-objective optimization, and its extensions address the challenge of eliciting and exploiting decision-maker preferences in expensive regimes (Heidari et al., 2024, Ungredda et al., 2021, Mamun et al., 2024).
1. Multi-Objective Problem Formulation
Given a decision space and objective functions , the Pareto set comprises solutions not dominated by any other point under the partial order:
The Pareto front is in objective space. In practical scenarios, exhaustive approximation of is infeasible due to evaluation expense. Instead, sampled non-dominated points are iteratively presented to a decision-maker (DM), who selects solutions aligning with their (typically hidden) preferences (Heidari et al., 2024).
2. ParEGO Core Methodology
ParEGO operates by scalarizing multiple objectives with randomly sampled weight vectors, fitting a GP surrogate on the resulting scalarized values, and directing Bayesian optimization via acquisition functions:
Scalarization: Augmented Weighted Tchebycheff Function
For a random weight , the standard -simplex, and reference point (often 0 over observed data), define:
1
where 2 (e.g., 0.01–0.05) avoids ties and enforces strict differentiability. Minimizing 3 for varying 4 yields Pareto-optimal solutions across the front (Ungredda et al., 2021, Heidari et al., 2024, Mamun et al., 2024).
Surrogate Modeling: Gaussian Processes
A GP surrogate 5 is fit to the scalarized objective values. The canonical kernel is the squared-exponential (SE) or Matérn 5/2, sometimes with an additive linear kernel:
6
Hyperparameters are estimated by maximizing GP marginal likelihood.
Acquisition: Expected Improvement
For data 7, compute EI on 8:
9
where 0 is the best observed scalarized value, and 1, 2 are standard normal CDF and PDF (Ungredda et al., 2021).
Algorithmic Workflow
| Step | Description |
|---|---|
| Initialization | Generate initial 3 samples (e.g., space-filling design) |
| Scalarization | Draw random 4; compute 5 for all 6 |
| GP Fit | Fit GP on 7 |
| Acquisition | Maximize EI over 8; select 9 |
| Evaluation | Evaluate 0; augment data and repeat |
Final output is the set of non-dominated solutions in 1.
3. Interactive and Data-Efficient Extensions
Elicitation of Preferences
Conventional ParEGO explores the entire Pareto front, often oversampling regions irrelevant to the DM. To address this, interactive extensions incorporate DM feedback, focusing search on the locally preferred region after the DM selects a "favorite" trade-off point (Heidari et al., 2024).
TRIPE: Triangulation-Based Region Exploration
TRIPE restricts exploration to the neighborhood of the DM's selected design in 2:
- Construct Delaunay triangulation on evaluated points.
- Identify simplices containing the current preferred point.
- Generate candidates as simplex centroids and convex-hull facet centers adjacent to the selection.
- Evaluate at these neighbors, updating the front and eliciting new DM preferences.
TRIPE is hyperparameter-free but scales poorly (3) beyond 4 due to triangulation complexity (Heidari et al., 2024).
WAPE: Weight-Adjustment-Based Exploration
Rather than limiting to local neighborhoods in 5, WAPE samples new weights near the DM's selected 6. Specifically:
- Perturb 7 multiplicatively by 8 for spread parameter 9.
- Normalize new 0 to the simplex: 1.
- For each 2, run the standard ParEGO GP/EI step and evaluate new points, with subsequent DM feedback.
- WAPE efficiently drives convergence toward the preferred region, outpacing both baseline and TRIPE in empirical tests (Heidari et al., 2024).
4. Algorithmic Variants: Batched and Noisy ParEGO
Parallel (Batch) ParEGO
parEGO generalizes to batch selection (3) by sequentially applying the acquisition 4, conditioning on points chosen in the current batch. Each iteration involves fitting GPs, sampling weights, computing the current-best scalarization, and selecting 5 maximizers iteratively (Mamun et al., 2024).
Noisy ParEGO (qNparEGO)
In the presence of noisy observations, qNparEGO integrates over GP posterior uncertainty when defining the incumbent best, yielding an acquisition function that averages improvement over both 6 and the Pareto frontier. Monte Carlo sampling is employed for both the predictive distribution at test points and the uncertain frontier, at increased computational cost (Mamun et al., 2024).
5. Implementation and Computational Considerations
Surrogate Model Details
- GP surrogates are independently fit per objective or jointly on scalarized data.
- Kernel selection typically includes SE or Matérn 5/2 plus a linear term.
- Hyperparameters (length scales, signal variance, noise variance) are optimized by maximum marginal likelihood.
Computational Complexity
- Each GP fit incurs 7 cost for 8 data points per iteration.
- Delaunay triangulation in TRIPE has exponential complexity in 9, limiting its use to low-dimensions (0).
- WAPE overhead grows only linearly in 1 and 2 per iteration.
- Batched and noisy variants entail costly MC integration, especially for high acquisition function accuracy.
Optimization Details
- EI maximization is generally done by multi-start L-BFGS, grid search, or CMA-ES.
- Weight vectors for scalarization are typically drawn uniformly from the simplex (e.g., via Dirichlet(1) sampling).
Common Implementation Choices
| Parameter | Typical Value / Method |
|---|---|
| Initial design | 3–4 (Latin hypercube, Halton) |
| Scalarization | 5–6 |
| Number of weights per iteration | 7 (standard), 8 in WAPE |
| Kernel | SE or Matérn 5/2 9 linear |
| EI optimizer | 20 L-BFGS restarts, CMA-ES |
6. Empirical Performance and Comparative Analysis
Extensive benchmarks demonstrate key properties of ParEGO and its extensions:
- On DTLZ2-type problems with 0 objectives and 1, WAPE exhibited the fastest convergence in opportunity cost (2), reliably recovering the DM’s true optimum (Heidari et al., 2024).
- TRIPE outperformed the interactive ParEGO baseline for 3, but its efficiency collapsed at higher 4 due to triangulation overhead.
- Standard ParEGO (without DM feedback) wastes most of the evaluation budget on globally exploring the front, leading to slower convergence of opportunity cost and wider dispersion in attained solutions.
In discrete multi-component alloy design, noisy and batched ParEGO (qNparEGO, qparEGO) attained 5–6 of maximal hypervolume after 7–8 iterations, lagging behind hypervolume-based methods such as qEHVI/qNEHVI, which routinely surpassed 9–0 in fewer steps. The bias inherent in weighted Chebyshev scalarization causes ParEGO to sample the Pareto set unevenly, sometimes neglecting wide swathes of the front (Mamun et al., 2024).
7. Strengths, Limitations, and Application Context
ParEGO’s chief advantages are algorithmic simplicity, ease of extension to parallel and noisy settings, and flexibility across different surrogate models. It is highly competitive in scenarios where high evaluation cost or low noise predominate and where covering the entire Pareto front is less critical than converging to the DM’s preferred solution. Its data-efficient interactive variants (WAPE and TRIPE) are especially effective when the DM's preferences drive the optimization loop, as is common in engineering design (Heidari et al., 2024).
Limitations include an inherent scalarization bias, inefficiency in high-dimensional design spaces or fronts, and compromised performance compared to hypervolume-based BO methods for comprehensive Pareto front discovery (Mamun et al., 2024). Noisy and batch variants close some efficiency gaps at substantial compute cost.
ParEGO and its variants remain foundational within the Bayesian multi-objective optimization literature, particularly for interactive, budget-constrained applications demanding efficient navigation of black-box trade-offs.