OpenRL: A Unified Reinforcement Learning Framework (2312.16189v1)
Abstract: We present OpenRL, an advanced reinforcement learning (RL) framework designed to accommodate a diverse array of tasks, from single-agent challenges to complex multi-agent systems. OpenRL's robust support for self-play training empowers agents to develop advanced strategies in competitive settings. Notably, OpenRL integrates NLP with RL, enabling researchers to address a combination of RL training and language-centric tasks effectively. Leveraging PyTorch's robust capabilities, OpenRL exemplifies modularity and a user-centric approach. It offers a universal interface that simplifies the user experience for beginners while maintaining the flexibility experts require for innovation and algorithm development. This equilibrium enhances the framework's practicality, adaptability, and scalability, establishing a new standard in RL research. To delve into OpenRL's features, we invite researchers and enthusiasts to explore our GitHub repository at https://github.com/OpenRL-Lab/openrl and access our comprehensive documentation at https://openrl-docs.readthedocs.io.
- Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
- Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- Lukas Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Dgpo: discovering multiple strategies with diversity-guided policy optimization. arXiv preprint arXiv:2207.05631, 2022.
- Extreme parkour with legged robots. arXiv preprint arXiv:2309.14341, 2023.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Hugging Face Inc. Hugging face – the ai community building the future., 2023. URL https://huggingface.co/. Accessed: 2023-11-06.
- Dailydialog: A manually labelled multi-turn dialogue dataset. arXiv preprint arXiv:1710.03957, 2017.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Tizero: Mastering multi-agent football with curriculum learning and self-play. arXiv preprint arXiv:2302.07515, 2023.
- Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In Conference on Robot Learning, pages 22–31. PMLR, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
- Neorl: A near real-world benchmark for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:24753–24765, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- Is reinforcement learning (not) for natural language processing?: Benchmarks, baselines, and building blocks for natural language policy optimization. arXiv preprint arXiv:2210.01241, 2022.
- Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506, 2020.
- Prefixrl: Optimization of parallel prefix circuits using deep reinforcement learning. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 853–858. IEEE, 2021.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Multi-agent reinforcement learning is a sequence modeling problem. Advances in Neural Information Processing Systems, 35:16509–16521, 2022.
- The surprising effectiveness of PPO in cooperative multi-agent games. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Bebold: Exploration beyond the boundary of explored regions. arXiv preprint arXiv:2012.08621, 2020.
- Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.