Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 73 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 191 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Enhancing Preference-based Linear Bandits via Human Response Time (2409.05798v4)

Published 9 Sep 2024 in cs.LG, cs.AI, cs.HC, econ.EM, and stat.ML

Abstract: Interactive preference learning systems infer human preferences by presenting queries as pairs of options and collecting binary choices. Although binary choices are simple and widely used, they provide limited information about preference strength. To address this, we leverage human response times, which are inversely related to preference strength, as an additional signal. We propose a computationally efficient method that combines choices and response times to estimate human utility functions, grounded in the EZ diffusion model from psychology. Theoretical and empirical analyses show that for queries with strong preferences, response times complement choices by providing extra information about preference strength, leading to significantly improved utility estimation. We incorporate this estimator into preference-based linear bandits for fixed-budget best-arm identification. Simulations on three real-world datasets demonstrate that using response times significantly accelerates preference learning compared to choice-only approaches. Additional materials, such as code, slides, and talk video, are available at https://shenlirobot.github.io/pages/NeurIPS24.html

Summary

The paper introduces a novel method that combines response times with binary choices to better capture the strength of user preferences.
It leverages the EZ-diffusion model to convert utility estimation into a linear regression problem, boosting computational efficiency.
Empirical results show that incorporating response times significantly speeds up learning, especially in clearly defined (easy) queries.

Leveraging Human Response Time in Preference-based Linear Bandits

The paper "Enhancing Preference-based Linear Bandits via Human Response Time" addresses a significant challenge in interactive preference learning systems: the limited information offered by binary human choices regarding preference strength. Traditional preference-based bandit algorithms employ binary feedback due to its simplicity and low cognitive demand on the user. However, these binary choices fail to adequately capture the nuance of user preference strength, which can hinder the efficacy of learning algorithms in applications such as recommendation systems, assistive robots, and assortment optimization.

To overcome this hurdle, the authors propose a novel approach that incorporates human response times alongside choice data to enrich the feedback. Response time has been shown to inversely correlate with preference strength; quick decisions are typically tied to strong preferences, while slower responses often indicate weaker preferences. The authors integrate the EZ-diffusion model, which jointly models human choices and response times, with preference-based linear bandits. A key contribution of the paper is a computationally efficient utility estimator that uses both choice and response time data, transforming the utility estimation problem into a linear regression problem.

Theoretical analysis and empirical evidence illustrate the advantage of this approach, especially in scenarios where queries exhibit strong preferences, referred to as "easy" queries. In these instances, choice data alone is insufficient, but response times can offer rich, complementary information. As a result, incorporating response times enhances the learning process, making even these easy queries more impactful in driving the learning of preferences.

The paper provides substantial results using simulations based on real-world datasets. These simulations demonstrate the utility estimator's capacity to significantly speed up learning processes when response times are considered. The research further contributes to best-arm identification within a fixed-budget context by developing variations of the Generalized Successive Elimination algorithm that incorporate response times.

Implications and Future Directions

The immediate practical implication of this research is its potential to optimize preference-based interactive systems by leveraging the usually neglected metric of response time. This can lead to more efficient algorithms that require fewer queries to converge on a user's preferences. Theoretically, this work contributes to a deeper understanding of how cognitive models of decision-making can be integrated into computational frameworks for preference learning.

One of the future directions suggested involves further exploration of more complex psychological models for more nuanced interpretations of human decision processes. Additionally, while the paper assumes a known non-decision time, future work might focus on creating models robust to varying levels of unknown or noisy non-decision times to extend applicability in real-world settings like crowdsourcing.

Finally, integrating this approach with other types of feedback mechanisms, such as eye-tracking or physiological sensors, could further enhance the richness of the preference signals received, driving even more efficient learning in recommendation systems, robotics, and beyond. The insights from this paper could be foundational in bridging algorithmic approaches with cognitive science, providing a framework that could enhance the development of human-centered AI systems.