EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data (2403.00564v2)
Abstract: Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.
- Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
- Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
- Planning in stochastic environments with a learned model. In International Conference on Learning Representations, 2021.
- Bellman, R. A markovian decision process. Journal of mathematics and mechanics, pp. 679–684, 1957.
- Openai gym, 2016.
- A system for general in-hand object re-orientation. In Conference on Robot Learning, pp. 297–307. PMLR, 2022.
- Visual dexterity: In-hand reorientation of novel and complex object shapes. Science Robotics, 8(84):eadc9244, 2023. doi: 10.1126/scirobotics.adc9244. URL https://www.science.org/doi/abs/10.1126/scirobotics.adc9244.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758, 2021.
- Coulom, R. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pp. 72–83. Springer, 2006.
- Policy improvement by planning with gumbel. In International Conference on Learning Representations, 2021.
- Multi-step reinforcement learning: A unifying algorithm. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Model-based value expansion for efficient model-free reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
- Mastering atari with discrete world models. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=0oabwyZbOu.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Temporal difference learning for model predictive control. arXiv preprint arXiv:2203.04955, 2022.
- Td-mpc2: Scalable, robust world models for continuous control, 2023.
- Learning and planning in complex action spaces. In International Conference on Machine Learning, pp. 4476–4486. PMLR, 2021.
- Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019.
- Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
- Almost optimal exploration in multi-armed bandits. In International conference on machine learning, pp. 1238–1246. PMLR, 2013.
- Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. In International Conference on Machine Learning, pp. 3499–3508. PMLR, 2019.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650. PMLR, 2020.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Dexpbt: Scaling up dexterous manipulation for hand-arm systems with population based training. arXiv preprint arXiv:2305.12127, 2023.
- Rubinstein, R. Y. Optimization of computer simulation models with rare events. European Journal of Operational Research, 99(1):89–112, 1997.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Online and offline reinforcement learning by planning with a learned model. Advances in Neural Information Processing Systems, 34:27580–27591, 2021.
- Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929, 2020.
- Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pp. 30365–30380. PMLR, 2023.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Daydreamer: World models for physical robot learning, 2022.
- On layer normalization in the transformer architecture. In International Conference on Machine Learning, pp. 10524–10533. PMLR, 2020.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
- Efficient learning for alphazero via path consistency. In International Conference on Machine Learning, pp. 26971–26981. PMLR, 2022.
- Improving deep neural networks using softplus units. In 2015 International joint conference on neural networks (IJCNN), pp. 1–4. IEEE, 2015.
- Shengjie Wang (29 papers)
- Shaohuai Liu (5 papers)
- Weirui Ye (9 papers)
- Jiacheng You (12 papers)
- Yang Gao (762 papers)