Stochastic Bandits with ReLU Neural Networks (2405.07331v1)

Published 12 May 2024 in cs.LG, cs.DS, and stat.ML

Abstract: We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

Authors (4)

Kan Xu (10 papers)
Hamsa Bastani (18 papers)
Surbhi Goel (44 papers)
Osbert Bastani (97 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1790231488161628622

https://twitter.com/AlgorithmPapers/status/1790334283325780274

Stochastic Bandits with ReLU Neural Networks (2405.07331v1)

Summary

Related Papers

Tweets