Generalised Entropy MDPs and Minimax Regret
Abstract: Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.