Preference-Based Optimization

Updated 28 May 2026

Preference-based optimization is a framework that relies on ordinal comparisons instead of direct scalar evaluations.
It employs surrogate models like Gaussian Processes and Radial Basis Functions to infer latent utility and balance exploration with exploitation.
Applications include human-in-the-loop design, multi-objective decision-making, and large language model alignment in constrained scenarios.

Preference-based optimization (PBO) refers to a class of algorithms that optimize an objective function using only comparative, ordinal, or preference feedback between alternatives, rather than direct access to cardinal values of the objective. This paradigm encompasses a wide spectrum of applications, including black-box engineering design, human-in-the-loop calibration, combinatorial optimization, and LLM alignment. PBO models a decision-maker’s or user’s implicit utility through systematic comparison queries, leveraging theoretical foundations in utility theory, surrogate modeling, probabilistic and Bayesian inference, and statistical learning. Recent advances have produced algorithmic and theoretical frameworks that not only guarantee global convergence but also scale to high-dimensional, constrained, or cost-sensitive scenarios.

1. Formalization and Theoretical Foundations

In preference-based optimization, the goal is to identify $\mathbf{x}^*$ in a feasible domain $\Omega$ that maximizes (for utility) or minimizes (for loss) a latent and unobservable function $u(\mathbf{x})$ (or $f(\mathbf{x}) = -u(\mathbf{x})$ ) using only pairwise (or higher-order) comparative feedback. Formally, given a preference oracle providing responses $b = \pi(\mathbf{x}_i, \mathbf{x}_j) \in \{-1,0,+1\}$ —where $-1$ means $\mathbf{x}_i$ is preferred, $+1$ means $\mathbf{x}_j$ is preferred, and $0$ means indifference—the optimizer iteratively queries the decision space and infers a utility representation. Utility-theoretic axioms (rationality, continuity, completeness) guarantee the existence of a continuous $\Omega$ 0 representing subjective preferences (Previtali et al., 2022). PBO frameworks generalize classical black-box optimization, serving as a unified machinery for cases where only relative preference data is accessible (Previtali et al., 2022).

The foundational statistical models for preference likelihoods include the Bradley-Terry and Thurstone-Mosteller models, and their generalizations to handle indifference or noisy/tied feedback (Erarslan et al., 7 Nov 2025, Dewancker et al., 2018). Bayesian and frequentist surrogate models—often with Gaussian process or radial basis surrogates—encode the evolving belief over the latent utility landscape.

2. Surrogate Modeling, Acquisition Strategies, and Algorithmic Structures

PBO methods employ surrogate models to interpolate and extrapolate user preferences throughout the decision space. These surrogates can be:

Radial Basis Function (RBF) Surrogates: Fitted to satisfy observed pairwise preferences via quadratic programming, with slack variables to handle inconsistencies and non-transitivity (Previtali et al., 2022).
Gaussian Process (GP) Models: Probabilistic surrogates over the latent utility, updated via variational inference or Laplace approximations given ordered feedback, and able to model observation noise and perceptual indifference bands (Erarslan et al., 7 Nov 2025, Dewancker et al., 2018).
Integrated Bayesian Preference Models: Explicitly modeling tie tolerance via Just Noticeable Difference thresholds (JND) (Erarslan et al., 7 Nov 2025), or extended with tie-sensitive categorical likelihoods (Dewancker et al., 2018).
Surrogate-based and Acquisition Functions: Sequentially select candidate points by maximizing acquisition criteria that balance exploitation (choosing points with the best predicted utility) and exploration (sampling points in poorly explored regions, or with maximal model uncertainty) (Previtali et al., 2022, Previtali et al., 2022).

Representative algorithmic schemas include:

Greedy δ-cycling: Dynamically cycling between exploitive and explorative trade-offs prevents stagnation in local optima, ensuring global convergence by guaranteeing pure exploration steps (Previtali et al., 2022).
Cost-aware Information-Theoretic Acquisition: Integrating explicit production and evaluation costs, as well as perceptual ambiguity, into mutual information–based acquisition (Erarslan et al., 7 Nov 2025).
Batch and Iterative Querying with Adaptive Acquisition: Supporting both single-step and batched sampling, efficient for expensive evaluations or human-limited settings (Previtali et al., 2022).

3. Extensions: Constraints, Multi-Objective and Group Preferences

PBO has been extended to handle constraints (often unknown or user-reported), multi-objective criteria, and group decision-making:

Constrained Optimization: Surrogates for feasibility and satisfaction probabilities are learned in parallel (using inverse-distance weighting or GP classifiers), and novel acquisition functions trade off objective improvement, exploration, and risk of invalid samples (Zhu et al., 2021).
Multi-objective and Group Decision-Making: All system performances and stakeholder desiderata are mapped into a unified preference domain through valid Preference Function Modeling (PFM), and a single aggregated preference (often via affine aggregation of z-scores) is maximized, enabling a unique, group-optimal solution (Wolfert, 19 Mar 2026).
Soft and Hard Constraints: Integrated into the solution space and acquisition logic—with the intersection of physical feasibility and acceptability determined by stakeholder-specified thresholds or learned preference boundaries (Wolfert, 19 Mar 2026, Zhu et al., 2021).
Dynamic Architectural Adaptation: Seamless handling of constraints, actor weights, and admissible solution spaces via the ODESYS methodology and IMAP solver (Wolfert, 19 Mar 2026).

4. Practical Implementations, Empirical Results, and Applications

Cutting-edge algorithms for PBO, such as GLISp-r, gMRS, and CPBO, exhibit strong empirical performance across standard benchmarks (e.g., Ackley, Bukin #6, Rosenbrock functions) and real-world calibration tasks. Key attributes include:

Convergence and Robustness: δ-cycling and pure exploration steps prove critical for avoiding premature convergence and ensure convergence to the global optimum in continuous compact domains (Previtali et al., 2022, Previtali et al., 2022).
Sample Efficiency: PBO methods achieve near-optimal performance while minimizing the number of queries—a valuable property in human-in-the-loop and costly evaluation settings (Erarslan et al., 7 Nov 2025).
Application Domains:
- Human-in-the-loop Design: Rapid calibration of exoskeleton gaits, controller synthesis, and subjective optimization tasks using CoSpar, PrefOpt, and mixed-initiative PBO (Tucker et al., 2019, Dewancker et al., 2018).
- LLM Alignment: Training generative models to match user preferences or minimize undesired behaviors using preference-optimized surrogates (e.g., DPO, RePO, BOPO) (Wu et al., 10 Mar 2025, Liao et al., 10 Mar 2025, Kim et al., 26 May 2025).
- Group-Optimal Decision-Making: ODESYS/FIVES for consensus among actors with heterogenous or conflicting objectives, validated in multi-agent engineering scenarios (Wolfert, 19 Mar 2026).
- Constrained and Multi-objective Settings: Automated controller tuning, synthetic and real-case optimization with hard and soft utility/feasibility constraints (Zhu et al., 2021).

5. Connections to and Generalizations of Classical Methods

PBO generalizes and unifies classical black-box and derivative-free optimization, reinforcement learning, and Bayesian optimization. Surrogate-based acquisition strategies, uncertainty modeling, and variational inference are adapted for the preference domain, allowing methods from one regime (e.g., expected improvement, Thompson sampling) to be reinterpreted or retooled for ordinal data (Dewancker et al., 2018, Previtali et al., 2022).

Recent work advances these frameworks by:

Incorporating Higher-order Preferences: Modeling ties, indifference thresholds, and noisy responses (Erarslan et al., 7 Nov 2025).
Extending to Multi-agent Settings: Formalizing conditions for preference-keyed, integrated, associative, and unique group-optimal outcomes (Wolfert, 19 Mar 2026).
Unifying Surrogate Construction: Treating black-box and preference-based inputs interchangeably under the same algorithmic skeleton (Previtali et al., 2022).
Enabling Algorithmic Flexibility: Support for RBF/GP surrogates, pure-exploit and pure-explore modes, and cost-aware criteria.

6. Limitations, Open Problems, and Future Directions

While PBO has achieved significant maturity, several challenges and frontiers remain:

Scalability to High Dimensions: Surrogate and acquisition updates can be expensive; GP models suffer from cubic scaling, and RBF-based QPs increase polynomially with sample size (Previtali et al., 2022).
Preference Query Design: Reducing cognitive burden and maximizing informativeness of queries is largely open, especially in complex or high-dimensional decision spaces (Erarslan et al., 7 Nov 2025).
Dynamic and Non-stationary Preferences: Modeling and accommodating time-varying or state-dependent preferences requires further exploration.
Extensions to Multi-way and Ordinal Queries: Most PBO frameworks focus on pairwise comparison; richer forms of feedback (partial rankings, scores, etc.) are less systematically handled.
Integrated Learning of Constraints and Preferences: Simultaneously handling uncertain feasibility, human acceptability, and non-stationary resource constraints in a joint optimization loop (Zhu et al., 2021, Wolfert, 19 Mar 2026).
Real-Time and Interactive Systems: Efficiently blending PBO methods with real-time inference and interactive design remains a promising area, especially as LLMs and user-centric tools (LAPPI) bridge user language and optimization inputs (Kuroki et al., 16 Dec 2025).

In summary, preference-based optimization synthesizes elements from utility theory, surrogate modeling, Bayesian inference, and statistical learning to provide a robust, flexible framework for optimization when only comparative feedback is accessible. Its theoretical foundations, proven convergence results, and versatility across domains make it a core methodology for human-centered design, alignment, and decision-making in contemporary computational sciences and engineering (Previtali et al., 2022, Erarslan et al., 7 Nov 2025, Wolfert, 19 Mar 2026, Zhu et al., 2021).