Preference-Based Optimization Algorithm

Updated 28 December 2025

Preference-based optimization algorithms are methods that optimize latent objective functions using only ordinal feedback rather than explicit numerical evaluations.
They construct surrogate models—such as RBF, Gaussian Process, or piecewise affine functions—that encode multi-level and certainty-guided preference information to approximate hidden utility landscapes.
These algorithms balance exploration and exploitation through adaptive acquisition functions, ensuring efficient convergence in complex engineering and human-in-the-loop applications.

A preference-based optimization algorithm is an iterative procedure for optimizing an unknown or inaccessible objective function using only preference information—typically pairwise, ordinal judgments—between candidate solutions rather than direct numerical evaluations. Such algorithms are increasingly critical in domains where the cost function is latent, multi-faceted (e.g., human comfort, aesthetic, or multi-criteria tradeoff), or costly/impossible to quantify explicitly. They exploit preference feedback—potentially involving gradated levels and certainty scoring—to construct surrogates of the latent utility landscape, and leverage exploration-exploitation tradeoffs to efficiently query new candidate solutions.

1. Formulation of Preference-Based Optimization Problems

In the general setting, one seeks the minimizer (or maximizer) of a latent objective $f(x)$ over a feasible set, often under black-box constraints, but $f(x)$ is not directly available. Instead, one obtains preference outcomes $p(x_1, x_2)$ , which may be:

discrete ordinal scales (e.g., a 5-point Likert: “much better”, “slightly better”, “as good as”, etc.) possibly coupled with a certainty level $c(x_1, x_2)$ quantized on a multi-level scale,
noisy binary pairwise comparisons, informing only $x_1 \succ x_2$ (preferred) or $x_2 \succ x_1$ .

The problem is then re-cast as seeking

$x^* = \underset{x}{\arg\min}\,f(x)$

with the only access being preference queries and outcomes (possibly with multiple grades and certainty weights). For $x^*$ , one requires

$p(x^*, x) \le 0\quad\forall x\quad\text{(all other feasible points are not preferred over %%%%7%%%%)}$

(Dao et al., 2023).

2. Surrogate Construction and Preference Encoding

Preference-based optimization algorithms universally rely on building a surrogate function $\widehat{f}$ mapping the decision space to approximate the “hidden” utility $f$ , exploiting observed preferences as constraints. Prominent approaches are:

Radial Basis Function (RBF) Surrogates: Constructed as

$\widehat{f}(x) = \sum_{i=1}^N \beta_i \phi(\gamma\Vert x-x_i\Vert^2)$

with $\phi(t) = 1/(1+t^2)$ (inverse-quadratic RBF), $\gamma > 0$ shape parameter, and weights $\beta$ estimated via a convex program that enforces surrogate differences to fall within bands consistent with observed preferences and their certainty scores. Multi-level outcomes and certainty produce multiple interval constraints per query, and associated slack penalties scale with certainty (Dao et al., 2023).

Gaussian Process (GP) Surrogates: Place a prior over $f$ and fit it using ordinal likelihoods, e.g., Thurstone–Mosteller probit models for pairwise comparisons:

$P(x \succ x'|f) = \Phi\left(\frac{f(x)-f(x')}{\sqrt{2}\sigma}\right)$

where $\Phi$ is the standard normal CDF and $\sigma^2$ models preference noise (Tucker et al., 2019).

Piecewise Affine Surrogates (PWA): For mixed variable problems, construct surrogates as maxima over affine regions, subject to MILP-feasible constraints (Zhu et al., 2023).

Surrogate fitting incorporates slack variables for inconsistency/noise and certainty-weighted penalties, enabling robust modeling in the presence of uncertain or pluralistic feedback (Dao et al., 2023).

3. Acquisition and Exploration–Exploitation Strategies

The acquisition function determines which new candidate $x$ to propose for preference querying, balancing regions deemed desirable by the surrogate and unexplored areas. Typical forms are:

Surrogate–Exploration blends:

$a(x) = \frac{\widehat{f}(x)}{\Delta\widehat{F} - \alpha_N z(x)}$

where $z(x)$ is, for instance, an arctangent of the reciprocal sum of inverse fourth powers of distances to previous points (zero at prior sampled locations), and $\alpha_N$ is an adaptive exploration weight responsive to improvement in search (Dao et al., 2023). Other schemes use min–max normalized surrogate and exploration terms with an adjustable bi-criteria trade-off parameter $\delta \in [0,1]$ (Previtali et al., 2022, Previtali et al., 2022).

Purely gradient-free pairwise SGD: Construct a gradient estimator from comparison feedback by uniform-sphere perturbation, e.g.,

$g_t = (d/\delta_t) Y_t u_t$

where $Y_t$ is the one-bit comparison result and $u_t$ is sampled from the unit sphere, updating $x$ via gradient descent. Convergence to stationary points is established under standard smoothness and variance assumptions (Wang et al., 20 Dec 2025).

Acquisition rules often adaptively trade off surrogate exploitation against exploration, cycling between global and local search (Previtali et al., 2022, Previtali et al., 2022). For multi-objective or combinatorial problems, batch acquisition and scalarization (e.g., augmented Chebyshev) are employed for Pareto front exploration (Astudillo et al., 2024).

4. Handling Multi-Level Outcomes, Certainty, and Constraints

Modern Preference-Based Optimization algorithms generalize simple binary comparison by incorporating richer human judgment:

Multi-level outcomes: Each query produces a set of preference grades (e.g., from a Likert scale), inducing multiple band constraints per preference in surrogate fitting. Certainty is encoded as weight factors in the convex program, such that high-certainty judgments create tight surrogate bands while uncertain outcomes impose looser constraints (Dao et al., 2023).
Unknown constraints: Surrogates of feasibility and satisfaction are learned (e.g., via inverse-distance weighting from binary labels), with penalties for infeasible or unsatisfactory points in the acquisition function to guide safe exploration (Zhu et al., 2021).

For mixed categorical and numerical variables, piecewise affine surrogates enable feasible optimization via mixed-integer programming (Zhu et al., 2023).

5. Theoretical Guarantees and Empirical Performance

Global convergence is established in frameworks where exploration functions guarantee infinitely many pure-exploration steps, ensuring sample density and convergence to the global optimum under continuity conditions (Previtali et al., 2022, Previtali et al., 2022).
Convergence rates for pairwise-SGD methods are $\mathcal{O}(T^{-1/2})$ under standard smoothness and stochasticity (Wang et al., 20 Dec 2025).
Surrogate-based methods can leverage certainty and multi-level outcomes to reduce worst-/average-case distances to the true optimum by $20$– $50\%$ compared to binary-only baselines across benchmark problems (Dao et al., 2023).
Incorporation of uncertainty weighting and advanced constraint surrogates leads to robust near-optimal solution finding in practical engineering tasks, e.g., exoskeleton gait personalization and MPC controller calibration, with high data-efficiency (Tucker et al., 2019, Zhu et al., 2021).

Preference-based optimization is demonstrably effective for black-box engineering calibration, multi-objective design, and human-in-the-loop tasks where scalar reward is inaccessible.

6. Algorithmic Structure and Practical Implementation

A prototypical preference-based optimization loop includes:

Initialization: Random or space-filling design to generate initial candidates; initial preferences collected.
Surrogate update: Solve a convex program or probabilistic model fitting that encodes the observed preferences (possibly with multi-level, certainty, and constraint labels).
Acquisition optimization: Solve for the next candidate via minimization of the acquisition function blending surrogate and exploration terms.
Querying: Present candidate(s) for preference judgment, possibly collecting multi-level/certainty feedback.
Update sample set and exploration weights; possibly adapt surrogate hyperparameters (e.g., via LOOCV).
Repeat until budget exhausted; return best sample as estimated optimum.

Pseudocode reflecting these steps appears in (Dao et al., 2023, Previtali et al., 2022, Previtali et al., 2022).

7. Comparison to Alternative and State-of-the-Art Methods

Preference-based methods generalize and often improve upon pure black-box optimization processes by:

Handling settings where direct objective evaluation is impossible or unreliable.
Flexibly encoding human judgment, including variable intensity and certainty.
Avoiding pitfalls of reward hacking, scalarization bias, or ill-posed numerical criteria.
Enabling robust performance under noise and uncertainty via adaptive slack penalization and weighted fitting (Dao et al., 2023).
Outperforming binary-only preference frameworks in practical settings and achieving significant improvements in sample efficiency and convergence to preferred solutions.

AmPL, gMRS, GLISp-r, CoSpar, and their variants are characterized by lightweight surrogate fitting via convex optimization, straightforward implementation for engineering workflows, and modular integration of multi-level preference structures and uncertainty weighting (Dao et al., 2023, Tucker et al., 2019, Previtali et al., 2022, Previtali et al., 2022).