Filtered Poisson Process Bandit on a Continuum (2007.09966v1)

Published 20 Jul 2020 in cs.LG and stat.ML

Abstract: We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^2/3) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics (2023)
Nash Regret Guarantees for Linear Bandits (2023)
The price of unfairness in linear bandits with biased feedback (2022)
Contextual Blocking Bandits (2020)
Thompson Sampling for Complex Bandit Problems (2013)