Preference-Based Learning in Multi-Objective Optimization

Updated 23 June 2026

Preference-based learning is a method that integrates explicit and implicit user trade-offs to focus the search on relevant regions of the Pareto frontier.
It employs scalarization, reference point methods, and learning-based mapping (PSL) to generate partially exact approximations of Pareto optimal sets.
Recent innovations like MoE model fusion and bilevel optimization offer scalable solutions for real-time decision support in high-dimensional, multi-objective problems.

Preference-based learning in multi-objective optimization refers to methodologies that leverage explicit or implicit user or decision-maker preferences to guide the search or learning process for approximating the Pareto set. The main goal is to generate solutions on or near the Pareto frontier that reflect specified trade-offs among objectives, with an emphasis on efficiently learning representations suitable for decision-making in high-dimensional, complex, or expensive evaluation contexts.

1. Foundations: Pareto Optimality and Preference Incorporation

A multi-objective optimization problem seeks to minimize (or maximize) a vector-valued objective function $f(x) = (f_1(x), \ldots, f_m(x)) \in \mathbb R^m$ over a feasible domain $x \in \mathcal X$ . A point $x^*$ is Pareto-optimal if there is no $x'$ such that $f_i(x') \le f_i(x^*)$ for all $i$ and $f_j(x') < f_j(x^*)$ for some $j$ . The set of all such non-dominated points forms the Pareto set, whose image under $f$ is the Pareto front.

In real applications, only a restricted subset of the Pareto set is relevant, reflecting user-preference information about desired trade-offs. Preference-based learning operationalizes this by efficiently approximating portions or representations of the Pareto set informed by preference vectors, value functions, or direct queries.

Preference information enters either explicitly—e.g., as scalarization weights, reference points, ranked pairs, or trade-off statements (e.g., "criterion $i$ is more important than $x \in \mathcal X$ 0")—or implicitly, via iterative user feedback or behavioral traces (Haishan et al., 2024, Zakharov et al., 2018).

2. Scalarization, Reference Point Methods, and Partially Exact Approximations

Scalarization is a foundational mechanism in preference-based learning. It transforms the vector-valued objective into a scalar surrogate, typically parameterized by preference vectors $x \in \mathcal X$ 1 (the standard simplex): $x \in \mathcal X$ 2 or via Tchebycheff and other methods. By varying $x \in \mathcal X$ 3, one obtains different efficient points; this observation grounds several classes of preference-based learning techniques (Lin et al., 2022, Haishan et al., 2024).

Reference point methods generalize this by seeking $x \in \mathcal X$ 4 that minimizes the distance (under a weighted norm) to a specified target point $x \in \mathcal X$ 5 in objective space, tightly linking to the structure of the Pareto set and enabling the direct translation of classical single-objective approximation techniques to multi-objective contexts (Büsing et al., 2012).

Beyond uniform approximations, preference-based learning facilitates partially exact Pareto set approximations, where exactness in certain objectives is prioritized due to user requirements, while other objectives are approximated within an $x \in \mathcal X$ 6 factor (Bazgan et al., 2023, Herzel et al., 2019). The class of polynomialsize 'one-exact' or 'quasi- $x \in \mathcal X$ 7-exact' approximation sets, their complexity, and attainable cardinalities have been characterized in detail for various problem classes.

3. Learning-Based Representation of the Pareto Set

Preference-conditioned regression and model-based approaches seek to learn a continuous mapping $x \in \mathcal X$ 8 from the preference simplex to the decision space, parameterized by $x \in \mathcal X$ 9 (usually the weights of a neural network or other model) (Lin et al., 2022, Tang et al., 2024, Haishan et al., 2024). This approach, termed Pareto Set Learning (PSL), constructs a surrogate manifold that approximates the true Pareto set in decision or objective space. Key components include:

Training Objective: Minimize the expected scalarized loss over a sampled set of preferences,

$x^*$ 0

Preference Optimization: Rather than sampling $x^*$ 1 uniformly, recent advances optimize the selection or distribution of preference points to yield better coverage or focus on user-relevant regions, formalized as a bilevel optimization problem (Haishan et al., 2024).
Batch Acquisition: PSL allows efficient batch selection for expensive function evaluations, as candidate solutions can be generated at arbitrary $x^*$ 2 by forward-passing through $x^*$ 3.

Empirical evidence indicates that PSL and its preference-optimized variants, especially when using augmented scalarization or reference-point losses, deliver significantly better coverage and density on the Pareto front than purely regression- or population-based schemes (Lin et al., 2022, Haishan et al., 2024).

4. Preference-Guided Model Fusion and Neural Network Applications

High-dimensional and complex models, such as deep neural networks, require scalable solutions for multi-objective or multi-task preference-based learning. Recent approaches employ Mixture-of-Experts (MoE) model fusion: a set of 'expert' models are each fine-tuned for a specific objective, and a lightweight router network is trained to synthesize new solutions via preference-conditioned convex combinations of expert parameters (Tang et al., 2024).

Key features:

The router accepts user-preference vectors $x^*$ 4 and outputs mixture coefficients $x^*$ 5.
The fused model $x^*$ 6 can be deployed in a single pass, with negligible inference overhead.
Empirical results on Transformer architectures (CLIP, GPT-2) demonstrate that this yields Pareto front approximations closely matching more expensive scalarization or hypernetwork approaches, with orders-of-magnitude less memory or compute per query.

MoE-based approaches enable "on-demand" generation of trade-off solutions, with direct user-interaction via setting $x^*$ 7.

5. Algorithmic and Theoretical Guarantees for Approximate Pareto Sets

Preference-based learning is closely tied to the theory of $x^*$ 8-Pareto sets. A set $x^*$ 9 is a $x'$ 0-Pareto set if for every feasible $x'$ 1, there exists $x'$ 2 such that $x'$ 3 for all $x'$ 4. Notable results include:

For polynomially-encodable problems, $x'$ 5-Pareto sets of size $x'$ 6 exist for $x'$ 7 objectives under mild conditions (Bazgan et al., 2023).
For one-exact approximations (exact in a specified objective, within $x'$ 8 for others), similar cardinality guarantees exist, but for $x'$ 9 objectives, no constant-factor approximation of the minimum cardinality is efficiently achievable (Herzel et al., 2019).
For bi-objective settings, efficient algorithms can approximate the Pareto set within a factor of 2 of the smallest possible (and this is tight) (0805.2646).
The equivalence between approximating Pareto sets and reference-point optimization: the computational complexity of preference-based approximation aligns with the complexity of reference-point optimization (Büsing et al., 2012).

Preference-based learning also extends to combinatorial and Boolean optimization domains, with MCS-enumeration algorithms yielding exact or $f_i(x') \le f_i(x^*)$ 0-approximate Pareto sets with rigorous guarantees (Guerreiro et al., 2022).

6. Preference Articulation, Reduction, and Trade-off Exploration

In applications, complete Pareto sets may be infeasible or unnecessary. Approaches have been developed to prune or focus solution sets using user-supplied preference information:

Axiomatic reduction: Incorporating "quanta of information" about relative criterion importance yields a reduced Pareto set by suitable transformation of objectives and repeated solution under the refined criterion (Zakharov et al., 2018, Zakharov et al., 2018).
Interactive exploration: Learned or approximated Pareto set models allow users to specify and adjust preferences in real time, returning solutions from the learned manifold $f_i(x') \le f_i(x^*)$ 1 or the MoE-fused set $f_i(x') \le f_i(x^*)$ 2 (Lin et al., 2022, Tang et al., 2024).
Preference-optimized sampling: Bilevel problems that select preference points to maximize diversity and coverage of the Pareto front, sometimes with entropy- or angular-penalty regularization, have recently shown strong performance in black-box, expensive-optimization settings (Haishan et al., 2024).

Preference-based techniques are thus central to both practical and theoretical advances in multi-objective optimization, providing mechanisms for efficient, scalable, and user-aligned solution discovery.

7. Methodological Innovations and Open Directions

Preference-based learning continues to evolve, integrating with advanced surrogate modeling, evolutionary computation, combinatorial enumeration, and neural techniques:

Polynomial and sum-of-squares (SOS) relaxations for global, smooth approximations of the Pareto set (Magron et al., 2014, Gorissen et al., 2015).
Viability theory and set-valued dynamic programming for nonconvex or disconnected Pareto frontiers (Guigue, 2012).
Trust-region and density-based strategies for uniform, well-distributed Pareto set coverage, especially in black-box and derivative-free scenarios (Ju et al., 2022).
Piecewise-linear, simplicial-complex, and mesh-based reconstructions enabling topologically faithful Pareto set approximations (Lovison, 2010).
The emergence of learning-based frameworks (PSL, PO-PSL) for continuous, interactive, and computationally efficient exploration of high-dimensional trade-off spaces (Lin et al., 2022, Haishan et al., 2024, Tang et al., 2024).

Open challenges include formalizing regret or convergence bounds for bilevel and preference-optimized PSL, scaling exact or approximation guarantees to very high dimensions or objectives, integrating richer forms of user-feedback or behavioral preference queries, and unifying disparate computational paradigms for broader applicability.

Preference-based learning remains a dynamic, deeply mathematical, and practically critical area within multi-objective optimization, with continued research focusing on both rigorous theoretical foundations and scalable, interactive methods for real-world decision support.