Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents (2407.01887v2)

Published 2 Jul 2024 in cs.LG, cs.AI, and cs.CL

Abstract: In-context decision-making is an important capability of artificial general intelligence, which LLMs have effectively demonstrated in various scenarios. However, LLMs often face challenges when dealing with numerical contexts, and limited attention has been paid to evaluating their performance through preference feedback generated by the environment. This paper is the first to investigate the performance of LLMs as decision-makers in the context of Dueling Bandits (DB). We compare GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, Llama 3.1, and o1-preview against eight well-established DB algorithms. Our results reveal that LLMs, particularly GPT-4 Turbo, quickly identify the Condorcet winner, thus outperforming existing state-of-the-art algorithms in terms of weak regret. Nevertheless, LLMs struggle to converge even when explicitly prompted to do so and are sensitive to prompt variations. To overcome these issues, we introduce a hybrid algorithm: LLM-Enhanced Adaptive Dueling (LEAD), which takes advantage of both in-context decision-making capabilities of LLMs and theoretical guarantees inherited from classic DB algorithms. We show that LEAD has theoretical guarantees on both weak and strong regret and validate its robustness even with noisy and adversarial prompts. The design of such an algorithm sheds light on how to enhance trustworthiness for LLMs used in decision-making tasks where performance robustness matters.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (37)

Authors (4)

Fanzeng Xia (3 papers)
Hao Liu (497 papers)
Yisong Yue (154 papers)
Tongxin Li (31 papers)

Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents (2407.01887v2)

Related Papers