Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 65 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 35 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 182 tok/s Pro

GPT OSS 120B 458 tok/s Pro

Claude Sonnet 4.5 38 tok/s Pro

2000 character limit reached

Human-in-the-Loop Optimization

Updated 5 October 2025

Human-in-the-Loop Optimization is a method that integrates human feedback into iterative search processes to address non-differentiable, subjective, and dynamic objectives.
Algorithmic frameworks such as Bayesian optimization, reinforcement learning, and gradient-free control leverage pairwise comparisons and real-time feedback to refine solution spaces.
Empirical studies demonstrate that incorporating human input can boost system efficiency—evidenced by reports like an 87% reduction in manipulation cost—and improve personalized outcomes.

Human-in-the-Loop Optimization (HILO) refers to a class of optimization methodologies and systems where a human provides feedback—either explicitly or implicitly—during the iterative search for an optimal solution. Rather than being fully automated, these algorithms deliberately incorporate human expertise, preferences, or judgments to guide, shape, or correct the optimization process. This strategy has seen systematization in a range of domains, including robotics, experimental design, AI-based control, user interface adaptation, neuroprosthetics, and interpretability-driven machine learning.

1. Principles and Motivations of Human-in-the-Loop Optimization

Standard optimization algorithms are designed to search for a maximum or minimum of a (possibly unknown or expensive-to-evaluate) objective function, with candidate solutions evaluated entirely by automated metrics or simulators. HILO departs from this paradigm by integrating human-driven signals into some or all of the following aspects:

Objective (loss) definition or adaptation
Feedback delivery (e.g., pairwise comparison, scoring, selection)
Solving non-differentiable or hard-to-model objectives (such as subjective quality, clinical sensation)
Interactive curation of constraints or solution space exploration
Real-time adjustment to evolving or underspecified goals

HILO can be formalized in various ways, but a recurring formulation is

$\underset{x \in \mathcal{A}}{\text{maximize}} \quad h(f(x))$

where $f(x)$ is either an automated process or physical system, and $h(\cdot)$ denotes human evaluation, often represented as a latent, unknown, possibly non-differentiable utility function. In many cases, human input is provided as relative (preferential) feedback rather than absolute scores, better aligning with human judgment and cognitive ergonomics (Wang et al., 2 Jun 2025).

Key motivations for HILO include overcoming limitations of pure automation in perception, preference, or creative exploration, addressing ill-posed objectives, achieving personalization, enabling adaptivity, and ensuring system safety and usability in contexts where automated metrics are inadequate or infeasible.

2. Algorithmic Frameworks and Feedback Modalities

The algorithmic design of HILO typically extends black-box optimization paradigms—Bayesian optimization (BO), model predictive control (MPC), and reinforcement learning (RL)—to allow for direct human feedback in the optimization loop.

Bayesian Optimization and Pairwise Preference

Preferential Bayesian Optimization (PBO) models the unknown human utility function $h$ via a probabilistic surrogate, incorporating binary or multi-way human choices between candidate solutions. As shown in several works, PBO leverages the Bradley–Terry–Luce model, where the probability of preferring $x_1$ over $x_2$ is:

$P(x_1 \succ x_2 | g) = \Phi( g(x_1) - g(x_2) )$

with $g$ modeled by a Gaussian process (GP) and $\Phi$ the standard normal cdf (Granley et al., 2023, Schoinas et al., 31 Jan 2025).

Gradient-Free Online Controllers

When used for real-time control in physical systems, HILO may employ controller updates based on finite-difference estimates of the gradient, using only binary (pairwise) human preference:

$u_{k+1} = u_k + \frac{\eta}{2\delta} \cdot 1_{(x_{k+1}, u_k+\delta v_k) \succ (x_k, u_{k-1}+\delta v_{k-1})} \, v_k$

where random direction $v_k$ enables local exploration, and the update mimics a stochastic gradient step using only binary human feedback (Wang et al., 2 Jun 2025).

High-Throughput, Batch, and Multi-Objective Formulations

Expert-guided Bayesian Optimization for experimental design presents batches of diverse candidate solutions at each step, constructed via multi-objective optimization that balances utility and diversity:

$\underset{X}{\text{maximize}} \quad ( \hat{U}(X),\ \hat{S}(X, x^*) )$

Here, $\hat{U}(X)$ is the sum of utility values, $\hat{S}(X, x^*)$ is the determinant of the covariance matrix among candidates, and the human selects a discrete point for evaluation (Savage et al., 2023).

Personalized Continual Learning Frameworks

Continual HILO leverages accumulated experience from prior users by maintaining an evolving Bayesian neural network (BNN) to model population-level characteristics, combined with user-specific Gaussian process surrogates. Generative replay and variance filtering prevent catastrophic forgetting while ensuring adaptation (Liao et al., 7 Mar 2025).

3. System Architectures and Human-Assistive Cues

Many HILO systems are deployed in settings where the human operates either as a supervisor, a collaborative decision maker, or an evaluator:

Mixed-Initiative Robotics: Human operators control the pre-grasp phase of a robotic arm using a haptic interface; an autonomous agent continually injects force cues based on gradients of a task-relevant cost function (such as the Task-Oriented Velocity Manipulability, TOV) to nudge the human toward grasps that facilitate efficient downstream manipulation (Esfahani et al., 2017).
Interactive Design and Creative Systems: In generative melody composition, a BO system proposes candidate musical fragments; the human provides preferential selection and optionally edits top candidates, with the system updating its latent-space GP from these choices (Zhou et al., 2020). GUI toolkits for XR design (AUIT) allow designers to specify high-level adaptation objectives, which are optimized via multi-objective solvers for in-situ adaptation of interface parameters (Jansen, 13 May 2025).
Clinical and Biomedical Applications: For visual prostheses and exoskeletons, HILO guides online personalization of high-dimensional control parameters to human preferences or physical outcomes. Patient choices or gait error metrics are used to update neural network encoders or exoskeleton controllers via Bayesian optimization, sometimes deploying online empirical models to counter model uncertainties or threshold misspecifications (Granley et al., 2023, Schoinas et al., 31 Jan 2025, Qian et al., 23 Mar 2025).
HVAC Control Systems: A Markov Decision Process (MDP) framework combines RL with human override feedback, integrating real-time sensor and occupant data to optimize energy consumption and comfort (Liang et al., 9 May 2025).

4. Types of Feedback and Human Roles

HILO systems may employ a range of feedback modalities:

Pairwise preference / binary comparisons (e.g., “Which image is clearer?” or “Which forecast is more accurate?”).
Discrete selection among alternatives (e.g., batch mode optimization in experimental design).
Ranking or explicit scoring of candidates.
Implicit behavioral signals (e.g., override events, action corrections, observed participation).
Natural language curation or constraint specification, translated via LLMs into optimization objectives or design constraints (Jin et al., 2023, Tiomoko et al., 21 May 2025).

Depending on the specific instantiation, humans may serve as:

Objective arbiters (solely providing feedback signals for optimization)
Co-planners (jointly refining, editing, or constraining candidate solutions)
Supervisors (overriding or accepting suggestions)
Meta-optimizers (curating or updating the optimization’s design or constraint space)

5. Empirical Validation and Systemic Benefits

The efficacy of HILO has been empirically validated in multiple domains:

Teleoperation and Robotics: Experiments integrating TOV-based force cues yielded up to 87% reduction in manipulation cost, with strong monotonic improvement in task efficiency as the operator followed haptically conveyed gradients (Esfahani et al., 2017).
Simultaneous Localization and Mapping (SLAM): Human corrections embedded in factor graphs reduced map inconsistency by up to 91% even with few inputs, through EM-based inference and COP-SLAM back-propagation (Nashed et al., 2017).
Clinical and Perceptual Personalization: User studies with simulated prosthetic vision and exoskeletons showed that human-guided optimization consistently yields higher subjective and objective quality compared to naive or automated baselines; the approach was robust to noisy inputs, model misspecification, and out-of-distribution parameters (Schoinas et al., 31 Jan 2025, Qian et al., 23 Mar 2025).
Design and Usability: Human-guided search enables broader design space exploration, improves performance, and (in some settings) reduces cognitive load, though it may reduce agency or expressiveness when algorithmic guidance constrains creative control (Chan et al., 2022, Jansen, 13 May 2025).

Benchmarks consistently indicate that human input—when well integrated—elevates both the quantitative and qualitative outcomes of the optimization process, accelerates convergence (especially in early iterations), and supports adaptation to personalized or evolving objectives.

6. Challenges: Human Factors, Stability, and Real-World Robustness

Several challenges are inherent to HILO:

Instability of Human Judgment: Experimental work demonstrates that human ratings can be inconsistent, contradictory, or subject to cognitive biases (anchoring, loss aversion, availability effects), violating the (often implicit) assumption that the human utility function is consistent and stationary (Ou et al., 2022). These issues can destabilize the optimization process. Mitigation includes designing UI features such as history views, context frames, and visual aids to reduce various types of decision noise.
Interaction with Plant/System Dynamics: In coupled systems (e.g., when controller optimization and plant response interact), the online learning loop must account for the plant’s transient behavior, stability, and error propagation. Theoretical analyses employ Lyapunov techniques to bound convergence and characterize steady-state errors introduced by dynamics and stochastic gradient approximations (Wang et al., 2 Jun 2025).
Expertise-Dependent Patterns: Studies reveal that novices achieve high objective performance with less iteration but terminate sooner and exhibit higher reported satisfaction; experts interact more, are less satisfied, and seek diverse or maximizing exploration. Designers are encouraged to adapt the interaction protocol and optimization loop to the user’s domain expertise (Ou et al., 2023).
Scalability and Continual Learning: As systems are personalized for numerous users, continual learning strategies (e.g., BNNs with generative replay) are needed to aggregate experience without catastrophic forgetting and to enable efficient real-time adaptation for new users (Liao et al., 7 Mar 2025).
Transparency and User Agency: Increased automation and optimization guidance may reduce user agency, creativity, and sense of ownership. Solutions involve mixed-initiative designs, dynamic control-sharing, and transparency mechanisms that communicate the rationale for suggestions or actions (Chan et al., 2022, Jansen, 13 May 2025).

7. Future Directions and Open Problems

Key areas for future development include:

Integration of dynamic and physical system properties (beyond pure kinematic or static models), particularly in robotics and assistive devices (Esfahani et al., 2017, Qian et al., 23 Mar 2025).
Richer forms of human feedback, including natural language, visual explanations, and context-aware constraints mediated by advanced LLMs (Jin et al., 2023, Tiomoko et al., 21 May 2025).
Adaptive human-in-the-loop optimization protocols that account for changing user expertise, changing system dynamics, and multi-objective trade-offs (Ou et al., 2023, Savage et al., 2023).
Scalable, population-level continual optimization that balances population priors and personalized (user-specific) adaptations, with generative replay to prevent model forgetting (Liao et al., 7 Mar 2025).
Theoretical analysis of convergence and sub-optimality under real-world constraints, dynamics, non-stationary feedback, and noisy human signals (Wang et al., 2 Jun 2025).
Design of bidirectional learning environments (e.g., for human–robot symbiosis and rehabilitation) where both the AI system and user adapt to each other through continuous mutual feedback (Chen et al., 11 Feb 2025).

In summary, Human-in-the-Loop Optimization is a rapidly developing research area enabling optimization strategies to access domains and objectives that are otherwise intractable or ambiguous to automate. The architecture, theory, and application contexts are diverse, but all require careful algorithmic design to robustly and efficiently integrate the strengths—and address the idiosyncrasies—of human feedback.