Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints (2304.14326v2)

Published 27 Apr 2023 in cs.LG

Abstract: We study online learning in episodic constrained Markov decision processes (CMDPs), where the learner aims at collecting as much reward as possible over the episodes, while satisfying some long-term constraints during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the learner. While online learning in classical (unconstrained) MDPs has received considerable attention over the last years, the setting of CMDPs is still largely unexplored. This is surprising, since in real-world applications, such as, e.g., autonomous driving, automated bidding, and recommender systems, there are usually additional constraints and specifications that an agent has to obey during the learning process. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints, in the flavor of Balseiro et al. (2023). Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially.

Citations (10)

Summary

We haven't generated a summary for this paper yet.