Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size (2211.11092v2)

Published 20 Nov 2022 in cs.LG, cs.AI, and cs.NE

Abstract: Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alexander Nikulin (19 papers)
  2. Vladislav Kurenkov (22 papers)
  3. Denis Tarasov (15 papers)
  4. Dmitry Akimov (3 papers)
  5. Sergey Kolesnikov (29 papers)
Citations (13)