A Simple Approach for Non-stationary Linear Bandits (2103.05324v2)

Published 9 Mar 2021 in cs.LG

Abstract: This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Existing studies develop various algorithms and show that they enjoy an $\widetilde{\mathcal{O}}(T^{{2/3}P_T^{1/3})$} dynamic regret, where $T$ is the time horizon and $P_T$ is the path-length that measures the fluctuation of the evolving unknown parameter. In this paper, we discover that a serious technical flaw makes their results ungrounded, and then present a fix, which gives an $\widetilde{\mathcal{O}}(T^{{3/4}P_T^{1/4})$} dynamic regret without modifying original algorithms. Furthermore, we demonstrate that instead of using sophisticated mechanisms, such as sliding window or weighted penalty, a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration, and restart it periodically to handle the drift of unknown parameters. Our approach enjoys an $\widetilde{\mathcal{O}}(T^{{3/4}P_T^{1/4})$} dynamic regret. Note that to achieve this bound, the algorithm requires an oracle knowledge of the path-length $P_T$. Combining the bandits-over-bandits mechanism by treating our algorithm as the base learner, we can further achieve the same regret bound in a parameter-free way. Empirical studies also validate the effectiveness of our approach.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Peng Zhao (162 papers)
Lijun Zhang (239 papers)
Yuan Jiang (48 papers)
Zhi-Hua Zhou (126 papers)

Citations (75)

View on Semantic Scholar

A Simple Approach for Non-stationary Linear Bandits (2103.05324v2)

Related Papers