ParEGO: Scalarization in Multi-objective Optimization

Updated 13 January 2026

ParEGO is a scalarization-based optimization method that converts multi-objective problems into single-objective subproblems using random weight vectors.
It leverages Gaussian process surrogates and Expected Improvement acquisition to efficiently explore expensive evaluation settings.
Extensions like dynamic hyperparameter importance and interactive preference elicitation enhance data efficiency and solution quality.

ParEGO is a scalarization-based Bayesian optimization method designed for multi-objective problems in which objective evaluations are expensive. Rather than operating directly on the vector-valued function $f(x) = (f_1(x),\dots,f_m(x))$ , ParEGO converts the problem into a series of single-objective subproblems via random-weight achievement scalarizations, fitting Gaussian process surrogates on these scalarized objectives and utilizing acquisition functions (typically Expected Improvement) to guide the sampling of new candidate solutions. Its foundations, extensions, and practical empirical successes make ParEGO a cornerstone in data-efficient multi-objective optimization, particularly in settings involving interactive decision-making and high-dimensional parameter spaces.

1. Core Algorithmic Structure and Scalarization

ParEGO seeks to minimize a vector-valued objective $f(x) = (f_1(x),\dots,f_m(x))$ over a domain $x \in \mathcal{X} \subset \mathbb{R}^d$ . It transforms the multi-objective minimization into a sequence of single-objective optimizations by scalarization using a randomly sampled weight vector $w = (w_1, ..., w_m)$ where $w_i \geq 0$ and $\sum w_i = 1$ (Ungredda et al., 2021, Heidari et al., 2024, Theodorakopoulos et al., 6 Jan 2026). The scalarizing function most often used is the augmented Tchebycheff:

$U_w(x) = \max_{i=1,\dots,m} (w_i f_i(x)) + \rho \sum_{i=1}^m w_i f_i(x)$

where $\rho$ is a small augmentation value (e.g., $\rho = 10^{-6}$ or $0.05$), ensuring uniqueness and encouraging diversity. In some implementations, a pure weighted-sum scalarization $U_{ws}(x) = \sum_{i=1}^m w_i f_i(x)$ is used, but the augmented Tchebycheff most robustly traces Pareto-optimal solutions.

The workflow is iterative:

Sample $w$ uniformly from the $m$ -simplex.
Compute $U_w(x^n)$ for each previously sampled candidate $x^n$ .
Fit a surrogate model (Gaussian process, or alternatives such as a random forest in SMAC) to $(x^n, U_w(x^n))$ .
Define the acquisition function ( $EI_w(x)$ ).
Optimize $EI_w(x)$ to propose the next candidate $x_{N+1}$ .
Evaluate $f(x_{N+1})$ and integrate it into the archive.
Repeat until the evaluation budget is exhausted.

By cycling the random scalarizations, ParEGO sweeps across trade-offs in objective space, constructing a well-spaced Pareto front approximation (Heidari et al., 2024, Theodorakopoulos et al., 6 Jan 2026).

2. Surrogate Modeling and Acquisition

ParEGO deploys Gaussian process (GP) surrogates on the scalarized objectives. Conditioning on $N$ observed samples, the GP posterior provides predictive mean $\mu^n(x)$ and variance $\sigma^2_n(x)$ at any query point $x$ , where

$\mu^n(x) = k^0(x,X) [K^0(X,X)]^{-1}(Y - \mu^0(X)) + \mu^0(x)$
$\sigma^2_n(x) = k^0(x,x) - k^0(x,X)[K^0(X,X)]^{-1}k^0(X,x)$

Here, $k^0$ is the chosen kernel, typically squared-exponential or Matérn, $K^0(X,X)$ is the Gram matrix, $X$ is the input matrix of all sampled $x^n$ , and $Y$ is the vector of scalarized observed values.

The acquisition strategy is classical Expected Improvement (EI):

$EI_w(x) = \mathbb{E}[ \max(0, u_{best} - U_w(x) ) ]$

where $u_{best}$ is the best (minimum) scalarized value observed so far. For GP posteriors, EI has the closed form:

$EI_w(x) = \sigma^n(x) [ \gamma(x) \Phi(\gamma(x)) + \phi(\gamma(x)) ]$

where $\gamma(x) = (u_{best} - \mu^n(x)) / \sigma^n(x)$ and $\Phi$ , $\phi$ are the standard normal cumulative and probability density functions. Optimization of the acquisition function may use direct search, evolutionary algorithms, or similar strategies (Ungredda et al., 2021, Heidari et al., 2024).

3. Interactive and Preference-Guided Extensions

The standard ParEGO algorithm can be inefficient for users interested in a single Pareto-optimal solution corresponding to their hidden preferences. As such, several preference-elicitation extensions have been proposed.

The one-step preference elicitation framework (Ungredda et al., 2021) presents the following strategy:

At a preselected iteration, a dense surrogate Pareto front is constructed using independent GPs for each objective.
The decision maker (DM) selects a preferred solution $x_p$ from this predicted front.
The algorithm infers a Tchebycheff-type weight $\hat{w}$ from $f(x_p)$ via $w_i/w_j \approx f_j(x_p)/f_i(x_p)$ and normalization.
Final exploitation steps are run with the surrogate and acquisition focused on $U_{\hat{w}}(x)$ .
If the true evaluation $f(x_{new})$ dominates $x_p$ , it is immediately recommended; otherwise, the nondominated set is presented to the DM for final selection.

TRIPE and WAPE (Heidari et al., 2024) extend ParEGO to interactive, multi-round preference elicitation:

TRIPE uses Delaunay triangulation to generate candidate points near the DM's favorite solution, focusing local search in the Pareto front region. This approach struggles to scale in high input dimensions.
WAPE samples new weight vectors around the DM's chosen weights, focusing EI-based search under local perturbations. It is more computationally scalable and demonstrates significant gains in data-efficiency.

Empirical results demonstrate that even a single mid-run DM interaction can drastically reduce opportunity cost compared to non-interactive approaches, especially under tight budgets or poorly sampled Pareto fronts (Ungredda et al., 2021, Heidari et al., 2024).

4. Dynamic Hyperparameter Importance in ParEGO

Recent advances integrate dynamic hyperparameter importance (HPI) into ParEGO, targeting settings such as hyperparameter optimization for machine learning models under multiple objectives (Theodorakopoulos et al., 6 Jan 2026). Under each scalarization $f_w(x)$ , a cooperative game formulates Shapley-value importance scores via the HyperSHAP estimator:

$\phi_j(w) = \sum_{S \subseteq \{1,\dots,d\}\setminus\{j\}} \frac{|S|! (d-|S|-1)!}{d!} [ \hat{f}_w(x_{S \cup \{j\}}) - \hat{f}_w(x_{S}) ]$

where $x_S$ denotes configurations with only $S$ active, holding others at baseline.

Algorithmically, after each surrogate fit, the smallest set $S$ of hyperparameters covering a cumulative $\tau$ (e.g., 80%) of Shapley mass is retained; others are fixed to their current incumbent values, reducing search dimension and focusing sample budget. Random draws and dynamic threshold schedules (e.g., “Symmetric-0.8”) mitigate over-pruning and maintain exploration.

This dynamic HPI-ParEGO achieves consistently faster convergence (20–30% reduction in trial counts) and larger final hypervolumes (up to 15%) compared with vanilla ParEGO and several common baselines across synthetic and real benchmarks (Theodorakopoulos et al., 6 Jan 2026).

5. Surrogate Pareto Front Estimation and Recommendation Strategies

ParEGO can estimate a continuous surrogate Pareto front at any time by fitting independent GPs to each objective $f_i(x)$ and minimizing $(\mu_1^n(x),\dots,\mu_m^n(x))$ over $x \in \mathcal{X}$ , typically using NSGA-II (Ungredda et al., 2021). This enables fine-grained exploration and preference elicitation not restricted to already evaluated solutions.

Final recommendation logic in preference-elicitation variants proceeds as follows:

If any newly proposed solution after DM elicitation dominates the DM’s chosen $x_p$ , it is recommended.
Otherwise, the nondominated set of sampled solutions is presented for a final choice.

This approach ensures maximal alignment with the DM's utility while preserving the theoretical guarantees of Pareto dominance (Ungredda et al., 2021).

6. Computational and Practical Considerations

ParEGO is designed for resource-constrained, expensive evaluation contexts—examples include engineering design, model selection in machine learning, and experimental materials science. Its single-surrogate approach offers significant data efficiency and scalability relative to methods requiring full multi-objective surrogates or archiving entire Pareto sets at each iteration.

Interactive variants such as WAPE, by refining search around DM preferences, offer approximately one order of magnitude improvement in expensive-evaluation efficiency on standard benchmarks (DTLZ2; OC $<10^{-3}$ in $\approx 60$ evaluations for WAPE compared to $>10^{-2}$ for vanilla ParEGO at 100 evaluations) (Heidari et al., 2024).

Triangulation-based schemes quickly lose efficacy as input dimensionality increases (TRIPE stagnates beyond $d_{in} \geq 5$ ), while weight-based approaches (WAPE) sustain gains up to nine dimensions.

Dynamic HPI integration demonstrates robust improvement even in challenging conditional hyperparameter spaces (Theodorakopoulos et al., 6 Jan 2026).

7. Extensions, Generalizations, and Applied Impact

ParEGO's architecture facilitates augmentation with richer scalarization schemes, alternative surrogate models, and hybrid acquisition functions. Its random-weighted scalarization approach also serves as a foundation for algorithms incorporating dynamic importance estimates and interactive preference adaptation.

Empirical evidence supports its superiority in reducing regret and enhancing the final Pareto front quality, outperforming competing Bayesian multi-objective optimization methods such as MO-TPE. Only direct evolutionary algorithms like NSGA-II may match or surpass HPI-ParEGO in specific late-stage convergence scenarios (Theodorakopoulos et al., 6 Jan 2026).

A plausible implication is that future advancements will see increased fusion of ParEGO-type scalarization frameworks with interpretable model-based importance, DM-interactive loops, and scalable surrogate modeling, extending their reach to dynamically complex, high-dimensional real-world multi-objective optimization tasks.