Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case (2102.05284v2)

Published 10 Feb 2021 in cs.LG and cs.AI

Abstract: We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve $\widetilde{O}(\sqrt{S2ADT_\star K})$ regret for the full-information setting and $\widetilde{O}(\sqrt{S3A2DT_\star K})$ regret for the bandit feedback setting, where $D$ is the diameter, $T_\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition. To remedy the gap between our upper bounds and the current best lower bounds constructed via a stochastically oblivious adversary, we also propose algorithms with near-optimal regret for this special case.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Liyu Chen (22 papers)
  2. Haipeng Luo (99 papers)
Citations (29)