Scalable Exploration via Ensemble++ (2407.13195v3)

Published 18 Jul 2024 in cs.LG, cs.AI, cs.HC, cs.IT, math.IT, and stat.ML

Abstract: Scalable exploration in high-dimensional, complex environments is a significant challenge in sequential decision making, especially when utilizing neural networks. Ensemble sampling, a practical approximation of Thompson sampling, is widely adopted but often suffers performance degradation due to {ensemble coupling} in shared layer architectures, leading to reduced diversity and ineffective exploration. In this paper, we introduce Ensemble++, a novel method that addresses these challenges through architectural and algorithmic innovations. To prevent ensemble coupling, Ensemble++ decouples mean and uncertainty estimation by separating the base network and ensemble components, employs a symmetrized loss function and the stop-gradient operator. To further enhance exploration, it generates richer hypothesis spaces through random linear combinations of ensemble components using continuous index sampling. Theoretically, we prove that Ensemble++ matches the regret bounds of exact Thompson sampling in linear contextual bandits while maintaining a scalable per-step computational complexity of $\tilde{O}( \log T)$. This provides the first rigorous analysis demonstrating that ensemble sampling can be an scalable and effective approximation to Thompson Sampling, closing a key theoretical gap in exploration efficiency. Empirically, we demonstrate Ensemble++'s effectiveness in both regret minimization and computational efficiency across a range of nonlinear bandit environments, including a language-based contextual bandits where the agents employ GPT backbones. Our results highlight the capability of Ensemble++ for real-time adaptation in complex environments where computational and data collection budgets are constrained. \url{https://github.com/szrlee/Ensemble_Plus_Plus}

Authors (4)

Yingru Li (14 papers)
Jiawei Xu (64 papers)
Zhi-Quan Luo (115 papers)
Baoxiang Wang (69 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - szrlee/GPT-HyperAgent: The official code repo for HyperAgent for neural bandits and GPT-HyperAgent for content moderation. (2 stars)

Tweets

https://twitter.com/RichardYRLi/status/1870136592641544246

Scalable Exploration via Ensemble++ (2407.13195v3)

Summary

Related Papers

GitHub

Tweets