Papers
Topics
Authors
Recent
2000 character limit reached

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Published 22 Apr 2015 in stat.ML and cs.LG | (1504.05823v2)

Abstract: Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $Xi_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $Xi_k$ denotes the outcome from population $i$ the $k{th}$ time it is sampled. It is assumed that for each fixed $i$, ${ Xi_k }_{k \geq 1}$ is a sequence of i.i.d. normal random variables, with unknown mean $\mu_i$ and unknown variance $\sigma_i2$. The objective is to have a policy $\pi$ for deciding from which of the $N$ populations to sample form at any time $n=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $\mu_i$ and $\sigma_i2$. In this paper, we present a simple inflated sample mean (ISM) index policy that is asymptotically optimal in the sense of Theorem 4 below. This resolves a standing open problem from Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

Citations (22)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.