Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Best-of-Both-Worlds Algorithms for Partial Monitoring (2207.14550v3)

Published 29 Jul 2022 in cs.LG and stat.ML

Abstract: This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m2 k4 \log(T) \log(k_{\Pi} T) / \Delta_{\min})$ in the stochastic regime and $O(m k{2/3} \sqrt{T \log(T) \log k_{\Pi}})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum suboptimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{\mathcal{G}}2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}2)$ in the stochastic regime and $O((c_{\mathcal{G}}2 \log(T) \log(k_{\Pi} T)){1/3} T{2/3})$ in the adversarial regime, where $c_{\mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Taira Tsuchiya (19 papers)
  2. Shinji Ito (31 papers)
  3. Junya Honda (47 papers)
Citations (14)