Stop Regressing: Training Value Functions via Classification for Scalable Deep RL (2403.03950v1)
Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
- One-step distributional reinforcement learning. CoRR, abs/2304.14421, 2023.
- An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning (ICML), 2020.
- Deep reinforcement learning at the edge of the statistical precipice. Neural Information Processing Systems (NeurIPS), 2021.
- Investigating multi-task pretraining and generalization in reinforcement learning. In International Conference on Learning Representations (ICLR), 2023.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47:253–279, 2013.
- A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
- A geometric perspective on optimal representations for reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2019.
- Distributional reinforcement learning. MIT Press, 2023.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Dopamine: A Research Framework for Deep Reinforcement Learning. CoRR, abs/1812.06110, 2018.
- Johan Samir Obando Ceron and Pablo Samuel Castro. Revisiting rainbow: Promoting more insightful and inclusive deep reinforcement learning research. In International Conference on Machine Learning (ICML), 2021.
- Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In Conference on Robot Learning (CoRL), 2023.
- The value-improvement path: Towards better representations for reinforcement learning. In AAAI Conference on Artificial Intelligence, 2021.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning (ICML), 2018.
- Generalization and regularization in DQN. CoRR, abs/1810.00123, 2018.
- Proto-value networks: Scaling representation learning with auxiliary tasks. In International Conference on Learning Representations (ICLR), 2023.
- Mastering diverse domains through world models. CoRR, abs/2301.04104, 2023.
- TD-MPC2: Scalable, robust world models for continuous control. In International Conference on Learning Representations (ICLR), 2024.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Momentum contrast for unsupervised visual representation learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Muesli: Combining improvements in policy optimization. In International Conference on Machine Learning (ICML), 2021.
- Retinagan: An object-aware approach to sim-to-real transfer. In IEEE International Conference on Robotics and Automation (ICRA), 2021.
- Improving regression performance with distributional losses. In International Conference on Machine Learning (ICML), 2018.
- Investigating the histogram loss in regression. CoRR, abs/2402.13425, 2024.
- Scaling laws for neural language models. CoRR, abs/2001.08361, 2020.
- End-to-end learning of geometry and context for deep stereo regression. In IEEE International Conference on Computer Vision (ICCV), 2017.
- Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems (NeurIPS), 2012.
- Conservative q-learning for offline reinforcement learning. Neural Information Processing Systems (NeurIPS), 2020.
- Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations (ICLR), 2021.
- Dr3: Value-based deep reinforcement learning requires explicit regularization. In International Conference on Learning Representations (ICLR), 2022.
- Offline Q-Learning on Diverse Multi-Task Data Both Scales and Generalizes. In International Conference on Learning Representations (ICLR), 2023.
- On the generalization of representations in reinforcement learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
- Bootstrapped representations in reinforcement learning. In International Conference on Machine Learning (ICML), 2023.
- Multi-game decision transformers. In Neural Information Processing Systems (NeurIPS), 2022.
- Beyond a*: Better planning with transformers via search dynamics bootstrapping. CoRR, abs/2402.14083, 2024.
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. CoRR, abs/2005.01643, 2020.
- A comparative analysis of expected and distributional reinforcement learning. In AAAI Conference on Artificial Intelligence, 2019.
- On the effect of auxiliary tasks on representation dynamics. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
- Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representations (ICLR), 2022.
- Disentangling the causes of plasticity loss in neural networks. CoRR, abs/2402.18762, 2024.
- Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research (JAIR), 61:523–562, 2018.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2016.
- Mixtures of experts unlock parameter scaling for deep rl. CoRR, abs/2402.08609, 2024.
- Pytorch: An imperative style, high-performance deep learning library. In Neural Information Processing Systems (NeurIPS), 2019.
- A step towards understanding why classification helps regression. In IEEE International Conference on Computer Vision (ICCV), pages 19972–19981, 2023.
- From sparse to soft mixtures of experts. In International Conference on Learning Representations (ICLR), 2024.
- Lcr-net++: Multi-person 2d and 3d pose detection in natural images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 42(5):1146–1161, 2019.
- Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision (IJCV), 126(2-4):144–157, 2018.
- The statistical benefits of quantile temporal-difference learning for value estimation. In International Conference on Machine Learning (ICML), 2023.
- Grandmaster-level chess without search. CoRR, abs/2402.04494, 2024.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Mastering the game of go without human knowledge. Nature, 550(7676):354–359, 2017.
- Offline RL for natural language generation with implicit language q learning. In International Conference on Learning Representations (ICLR), 2023.
- Offline actor-critic reinforcement learning scales to large models. CoRR, abs/2402.05546, 2024.
- Regression as classification: Influence of task formulation on neural network features. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
- Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Regression by classification. In Brazilian Symposium on Artificial Intelligence, pages 51–60. Springer, 1996.
- Pixel recurrent neural networks. In International Conference on Machine Learning (ICML), 2016.
- Attention is all you need. Neural Information Processing Systems (NeurIPS), 2017.
- The benefits of being distributional: Small-loss bounds for reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2023.
- Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research (JAIR), 3:383–403, 1995.
- Improving deep regression with ordinal entropy. In International Conference on Learning Representations (ICLR), 2023.