Chebyshev Scalarization

Updated 11 May 2026

Chebyshev scalarization is a method for multi-objective optimization that converts vector objectives into a single scalar value by using a weighted maximum deviation from an ideal point.
It provides complete Pareto coverage and tight approximation guarantees, ensuring both weak and strict Pareto optimal solutions even in non-convex and high-dimensional settings.
Algorithmic variants like smooth, set-based, and target-adaptive scalarizations enable efficient gradient-based optimization with improved convergence rates and scalability.

Chebyshev scalarization is a family of scalarizing techniques for multi-objective optimization, transforming a vector-valued objective into a single-valued function by applying a weighted max (ℓ∞ or Chebyshev) norm to the deviations from a reference (typically ideal or utopian) point. This approach yields both theoretical guarantees and significant practical advantages for discovering the full set of (weak) Pareto optimal solutions, including in non-convex and many-objective regimes.

1. Mathematical Formulation and Definitions

For a vector-valued minimization problem with $m$ objectives,

$\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$

given a strictly positive weight vector $w \in \Delta^{m-1}$ (the unit simplex) and an ideal (utopian) point $z^* \in \mathbb{R}^m$ , the Chebyshev scalarization is defined as:

$\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$

or, if one ensures $f_i(x) \geq z^*_i$ (typically for minimization),

$\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i (f_i(x) - z^*_i)$

This function transforms the multi-objective problem into a single scalar function emphasizing the worst-case (largest) weighted deviation among objectives (Lin et al., 2024, Liu et al., 2024, Helfrich et al., 2023).

A point $x^*$ minimizing $\phi_T(\cdot; w, z^*)$ is weakly Pareto optimal for the original problem. Under mild regularity ( $w_i > 0$ for all $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 0 or uniqueness), $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 1 is strictly Pareto optimal (Liu et al., 2024, Helfrich et al., 2023). The Chebyshev scalarization is exact, in the sense that every weak Pareto point can be realized as an optimizer for some $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 2 (Helfrich et al., 2023, Silva et al., 2022).

Smooth Chebyshev Scalarization: To enable gradient-based optimization, the nonsmooth “max” is often replaced by the log-sum-exp (LSE) surrogate:

$\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 3

As $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 4, this converges uniformly to the nondifferentiable Chebyshev scalarization (Lin et al., 2024, Lin et al., 2024).

2. Theoretical Properties

Chebyshev scalarization possesses several key theoretical guarantees:

Complete Pareto Coverage: Every (weak) Pareto optimal solution is a minimizer of some Chebyshev scalarization for a suitable $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 5 and $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 6 (Helfrich et al., 2023, Liu et al., 2024, Silva et al., 2022). Linear (weighted-sum) scalarization generically fails to find non-convex front points.
Exact Approximation Quality: In the general theory of scalarizations, Chebyshev (ℓ∞-norm) scalarization achieves the tightest possible approximation factor $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 7; no other scalarization can improve upon this for compact feasible sets (Helfrich et al., 2023).
Duality and Invariance: The perfect approximation guarantee extends to any combination of minimization and maximization objectives via a dualization (flip) transformation, e.g., $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 8 for maximization (Helfrich et al., 2023).
Sufficient and Necessary Global Characterization: Integral conditions applied to the Chebyshev scalarization yield necessary and sufficient criteria for global weak Pareto optimality (mean equals level, zero variance over level sets) (Silva et al., 2022).

Smooth Chebyshev scalarization maintains these properties in the limit $\min_{x \in X} \; f(x) = (f_1(x), \ldots, f_m(x))$ 9 and enables provable convergence guarantees for gradient-based methods; for convex objectives, accelerated rates $w \in \Delta^{m-1}$ 0 are achievable (Lin et al., 2024, Liu et al., 2024).

3. Algorithmic Techniques and Variants

Several frameworks for optimizing with Chebyshev scalarization are prominent, tailored for different problem structures:

(a) Gradient-based Methods

Subgradient methods can be applied directly, but are hindered by nondifferentiability at ties. Smooth Chebyshev scalarization using LSE surrogates permits use of standard first-order or accelerated algorithms, with explicit gradients:

$w \in \Delta^{m-1}$ 1

where $w \in \Delta^{m-1}$ 2 is the normalized softmax weight (Lin et al., 2024).

(b) Online Mirror Descent

A saddle-point formulation is employed in OMD-TCH, optimizing $w \in \Delta^{m-1}$ 3 with mirror descent for each player. The method enjoys a convergence rate $w \in \Delta^{m-1}$ 4, with the adaptive AdaOMD-TCH conversion further improving practical performance without loss of theoretical guarantees (Liu et al., 2024).

(c) Set-based Scalarization

In many-objective optimization ( $w \in \Delta^{m-1}$ 5), Tchebycheff set scalarization (TCH-Set) extends the approach to find a small set of $w \in \Delta^{m-1}$ 6 solutions:

$w \in \Delta^{m-1}$ 7

and its smooth variant (STCH-Set) applies dual log-sum-exp smoothing. These methods allow a handful of solutions (e.g., $w \in \Delta^{m-1}$ 8) to collectively cover hundreds of objectives with each objective addressed well by at least one solution (Lin et al., 2024).

(d) Target Point–based Scalarization

The TPTD scalarization defines subproblems using Chebyshev distance to an adaptively placed “target point” on a hyperplane in the normalized objective space:

$w \in \Delta^{m-1}$ 9

Adaptive placement of these target points ensures thorough coverage of the Pareto front, even with complex (e.g., inverted triangular) shapes, and is efficiently parallelizable with natural evolution strategies (Nagakane et al., 1 May 2025).

4. Computational and Practical Considerations

Comparison of Chebyshev to other scalarizations reveals practical strengths:

Non-convex Pareto Fronts: Chebyshev scalarization identifies non-convex parts missed by linear scalarization (Liu et al., 2024, Mahapatra et al., 2021, Bednarczuk et al., 2023).
Discrete/Combinatorial Problems: In the multiple-choice knapsack, Chebyshev scalarization (in KISSA) recovers Pareto-optimal points inaccessible to linear methods, improving optimality gaps with negligible computational overhead (Bednarczuk et al., 2023).
Many-objective Regimes: TCH-Set and STCH-Set scale to problems with $z^* \in \mathbb{R}^m$ 0 objectives using only $z^* \in \mathbb{R}^m$ 1 solutions, dramatically reducing sample complexity compared to exponential scaling in Pareto covering (Lin et al., 2024).
Gradient Smoothness and Convergence: Smooth Chebyshev surrogates enable efficient, stable convergence; recommended $z^* \in \mathbb{R}^m$ 2 on the order of $z^* \in \mathbb{R}^m$ 3 balances fidelity and convergence (Lin et al., 2024, Lin et al., 2024).

Setting	Chebyshev Advantage	Source
Non-convex PF	Complete Pareto coverage	(Liu et al., 2024)
Discrete/Knapsack	Tighter optimality gap, hidden points	(Bednarczuk et al., 2023)
Many-objective	Logarithmic solution set size	(Lin et al., 2024)
Smooth optimization	Efficient first-order algorithms	(Lin et al., 2024)
Federated learning	Improved fairness, worst-case coverage	(Liu et al., 2024)

5. Set-based and Adaptive Extensions

Set Scalarization applies Chebyshev selection over the entire set of $z^* \in \mathbb{R}^m$ 4 points, optimizing the worst “best-for-any-objective” across the set. The STCH-Set surrogate enables scalable, fully differentiable optimization when $z^* \in \mathbb{R}^m$ 5 and $z^* \in \mathbb{R}^m$ 6 are large.

Target Point–based Tchebycheff Distance adapts the target for each subproblem based on the geometry of the (possibly non-convex or disconnected) Pareto front, ensuring comprehensive and uniform coverage—even in pathological cases such as inverted triangular fronts. This approach is robust to variable dependencies and optimizes efficiently with evolutionary or black-box single-objective solvers (Nagakane et al., 1 May 2025).

6. Empirical Studies and Applications

Convex Quadratic, Mixed Linear/Nonlinear Regression: STCH-Set achieves the lowest worst-case and often best average objectives, outperforming linear, TCH, MosT, and SoM baselines (Lin et al., 2024).
Multiple-choice Knapsack: KISSA with Chebyshev scalarization improves upon BISSA in ~20% of benchmark instances, reducing optimality gaps especially for weakly correlated data (Bednarczuk et al., 2023).
Federated Learning under Fairness: OMD-TCH and AdaOMD-TCH improve agnostic loss, accuracy parity, and worst-client loss, sometimes sacrificing average accuracy for better fairness guarantees (Liu et al., 2024).
Multi-Task Learning: EPO Search, building on Chebyshev scalarization, yields network parameters tracking specified task tradeoffs and robustly approximating the Pareto front (Mahapatra et al., 2021).
Hypervolume and Wall-Time Metrics: Target point–based Chebyshev scalarization (TPTD) achieves state-of-the-art hypervolume, with up to 474 $z^* \in \mathbb{R}^m$ 7 speedup over traditional evolutionary multi-objective algorithms (Nagakane et al., 1 May 2025).
Derivative-Free Multiobjective Benchmarks: Integral mean-value methods (MVLSM) based on Chebyshev scalarization are globally convergent, robust, and computationally efficient for low-dimensional settings (Silva et al., 2022).

7. Guidelines and Limitations

Parameterization:

Weights ( $z^* \in \mathbb{R}^m$ 8): Uniform $z^* \in \mathbb{R}^m$ 9 works in absence of preference; all $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 0 is required for full Pareto recovery (Lin et al., 2024).
Smoothing ( $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 1): Values in $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 2 realize a practical tradeoff between smoothness and equivalence to the original max (Lin et al., 2024, Lin et al., 2024).
Number of Solutions ( $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 3 in Set Scalarization): Empirical evidence suggests $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 4– $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 5 suffices for $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 6 objectives (Lin et al., 2024).

Limitations:

Non-convex loss landscapes may induce local minima or trap solutions in both nonsmooth and smooth Chebyshev optimization. Careful $\phi_T(x; w, z^*) = \max_{i=1,\ldots,m} w_i \, |f_i(x) - z^*_i|$ 7 annealing and initialization, potentially via pre-solved single-solution scalarizations, can improve outcomes (Lin et al., 2024).
In high-dimensional decision spaces, integral-based methods require surrogates or grid discretization for scalable performance (Silva et al., 2022).

A plausible implication is that Chebyshev scalarization, and its recent set-based and target-adaptive variants, are now the canonical toolset for robustly and efficiently approximating and exploring Pareto fronts in diverse, high-dimensional, and complex multi-objective optimization tasks. Their theoretical optimality, invariance across minimization/maximization decompositions, and suitability for both gradient-based and black-box optimization currently surpass alternative scalarization frameworks for general multi-objective applications.

Principal Sources: (Lin et al., 2024, Liu et al., 2024, Lin et al., 2024, Helfrich et al., 2023, Nagakane et al., 1 May 2025, Mahapatra et al., 2021, Bednarczuk et al., 2023, Silva et al., 2022)