Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Contextual Bandits with Deep Representation and Shallow Exploration (2012.01780v1)

Published 3 Dec 2020 in cs.LG and stat.ML

Abstract: We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves $\tilde{O}(\sqrt{T})$ finite-time regret, where $T$ is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pan Xu (68 papers)
  2. Zheng Wen (73 papers)
  3. Handong Zhao (38 papers)
  4. Quanquan Gu (198 papers)
Citations (65)

Summary

We haven't generated a summary for this paper yet.