Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring (1509.09011v1)

Published 30 Sep 2015 in stat.ML and cs.LG

Abstract: Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

Citations (22)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring (1509.09011v1)

Summary

Related Papers