Improved Regret Bounds for Bandits with Expert Advice (2406.16802v1)

Published 24 Jun 2024 in cs.LG and stat.ML

Abstract: In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Nicolò Cesa-Bianchi (83 papers)
Khaled Eldowa (6 papers)
Emmanuel Esposito (11 papers)
Julia Olkhovskaya (11 papers)

Improved Regret Bounds for Bandits with Expert Advice (2406.16802v1)

Related Papers

Tweets