Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 88 tok/s
GPT OSS 120B 471 tok/s Pro
Kimi K2 220 tok/s Pro
2000 character limit reached

Balancing optimism and pessimism in offline-to-online learning (2502.08259v2)

Published 12 Feb 2025 in cs.LG and cs.AI

Abstract: We consider what we call the offline-to-online learning setting, focusing on stochastic finite-armed bandit problems. In offline-to-online learning, a learner starts with offline data collected from interactions with an unknown environment in a way that is not under the learner's control. Given this data, the learner begins interacting with the environment, gradually improving its initial strategy as it collects more data to maximize its total reward. The learner in this setting faces a fundamental dilemma: if the policy is deployed for only a short period, a suitable strategy (in a number of senses) is the Lower Confidence Bound (LCB) algorithm, which is based on pessimism. LCB can effectively compete with any policy that is sufficiently "covered" by the offline data. However, for longer time horizons, a preferred strategy is the Upper Confidence Bound (UCB) algorithm, which is based on optimism. Over time, UCB converges to the performance of the optimal policy at a rate that is nearly the best possible among all online algorithms. In offline-to-online learning, however, UCB initially explores excessively, leading to worse short-term performance compared to LCB. This suggests that a learner not in control of how long its policy will be in use should start with LCB for short horizons and gradually transition to a UCB-like strategy as more rounds are played. This article explores how and why this transition should occur. Our main result shows that our new algorithm performs nearly as well as the better of LCB and UCB at any point in time. The core idea behind our algorithm is broadly applicable, and we anticipate that our results will extend beyond the multi-armed bandit setting.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run paper prompts using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube