Maximin Action Identification: A New Bandit Framework for Games (1602.04676v1)

Published 15 Feb 2016 in math.ST, cs.GT, stat.ML, and stat.TH

Abstract: We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

Citations (28)

View on Semantic Scholar

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Maximin Action Identification: A New Bandit Framework for Games (1602.04676v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (3)

Don't miss out on important new AI/ML research

Maximin Action Identification: A New Bandit Framework for Games (1602.04676v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (3)

Don't miss out on important new AI/ML research