Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization (2312.05787v1)
Abstract: Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method with a high RR and regularization. To apply REDQ to sparse-reward goal-conditioned tasks, we make the following modifications to it: (i) using hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics (Plappert et al., 2018), and show that it achieves about $2 \times$ better sample efficiency than previous state-of-the-art (SoTA) RL methods. Furthermore, we reconsider the necessity of specific components of REDQ and simplify it by removing unnecessary ones. The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics.
- Deep reinforcement learning at the edge of the statistical precipice. In Proc. NeurIPS, 2021.
- Pulkit Agrawal. The task specification problem. In Proc. CoRL, 2022.
- Hindsight experience replay. In Proc. NeurIPS, 2017.
- Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Efficient online reinforcement learning with offline data. arXiv preprint arXiv:2302.02948, 2023.
- A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
- Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
- The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications. In Proc. AAAI, 2023.
- OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
- Randomized ensembled double Q-learning: Learning fast without a model. In Proc. ICLR, 2021.
- Diversity-based trajectory and goal selection with hindsight experience replay. In Proc. PRICAI, 2021.
- Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/Gymnasium-Robotics.
- Sample-efficient reinforcement learning by breaking the replay ratio barrier. In Proc. ICLR, 2023.
- Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
- Curriculum-guided hindsight experience replay. In Proc. NeurIPS, 2019.
- D4RL: datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Addressing function approximation error in actor-critic methods. In Proc. ICML, 2018.
- For sale: State-action representation learning for deep reinforcement learning. arXiv preprint arXiv:2306.02451, 2023.
- Distributed reinforcement learning of targeted grasping with active vision for mobile manipulators. In Proc. IROS, 2020.
- Co-adaptation of algorithmic and implementational innovations in inference-based deep reinforcement learning. In Proc. NeurIPS, 2021.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. ICML, 2018.
- Learning agile soccer skills for a bipedal robot with deep reinforcement learning. arXiv preprint arXiv:2304.13653, 2023.
- Dropout Q-functions for doubly efficient reinforcement learning. In Proc. ICLR, 2022.
- Qgraph-bounded Q-learning: Stabilizing model-free off-policy deep reinforcement learning. arXiv preprint arXiv:2007.07582, 2020.
- When to trust your model: Model-based policy optimization. In Proc. NeurIPS, 2019.
- Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
- Adam: A method for stochastic optimization. 2015.
- Reward (mis) design for autonomous driving. Artificial Intelligence, 316:103829, 2023.
- Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In Proc. ICLR, 2021.
- PLASTIC: Improving input and label plasticity for sample efficient reinforcement learning. In Proc. NeurIPS, 2023.
- Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020.
- Efficient deep reinforcement learning requires regulating overfitting. In Proc. ICLR, 2023a.
- Accelerating exploration with unlabeled prior data. arXiv preprint arXiv:2311.05067, 2023b.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Episodic memory deep q-networks. In Proc. IJCAI, 2018.
- Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299, 2022.
- Relay hindsight experience replay: Continual reinforcement learning for robot manipulation tasks with sparse rewards. arXiv preprint arXiv:2208.00843, 2022.
- Guided meta-policy search. In Proc. NeurIPS, pp. 9653–9664, 2019.
- Cal-QL: Calibrated offline RL pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479, 2023.
- The primacy bias in deep reinforcement learning. In Proc. ICML, 2022.
- Self-imitation learning. In Proc. ICML, 2018.
- Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.
- Curiosity-driven exploration by self-supervised prediction. In Proc. ICML, 2017.
- Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In Proc. ICML, 2020.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
- Exploration via hindsight goal generation. In Proc. NeurIPS, 2019.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Bigger, better, faster: Human-level Atari with human-level efficiency. In Proc. ICML, 2023.
- Self-improving robots: End-to-end autonomous visuomotor reinforcement learning. arXiv preprint arXiv:2303.01488, 2023.
- Learning to play in a day: Faster deep reinforcement learning by optimality tightening. In Proc. ICLR, 2017.
- A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860, 2022.
- Grow your limits: Continuous improvement with real-world rl for robotic locomotion. arXiv preprint arXiv:2310.17634, 2023a.
- Learning and adapting agile locomotion skills by transferring experience. arXiv preprint arXiv:2304.09834, 2023b.
- The dormant neuron phenomenon in deep reinforcement learning. arXiv preprint arXiv:2302.12902, 2023.
- FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing. arXiv pre-print arXiv:2304.09831, 2023.
- Reinforcement learning: An introduction. MIT press, 2018.
- # exploration: A study of count-based exploration for deep reinforcement learning. In Proc. NeurIPS, 2017.
- Yunhao Tang. Self-imitation learning via generalized lower bound Q-learning. In Proc. NeurIPS, 2020.
- MuJoCo: A physics engine for model-based control. In Proc. IROS, pp. 5026–5033. IEEE, 2012.
- Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. In Proc. NeurIPS, 2019.
- Deep reinforcement learning and the deadly triad. arXiv preprint arXiv:1812.02648, 2018.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
- A survey of multi-task deep reinforcement learning. Electronics, 9(9):1363, 2020.
- Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
- Efficient multi-goal reinforcement learning via value consistency prioritization. Journal of Artificial Intelligence Research, 77:355–376, 2023.
- Language to rewards for robotic skill synthesis. Arxiv preprint arXiv:2306.08647, 2023.
- Le Zhao and Wei Xu. Faster reinforcement learning with value target lower bounding, 2023. URL https://openreview.net/forum?id=WWYHBZ1wWzp.
- Energy-based hindsight experience prioritization. In Proc. CoRL, 2018.
- Curiosity-driven experience prioritization via density estimation. arXiv preprint arXiv:1902.08039, 2019.
- Maximum entropy-regularized multi-goal reinforcement learning. In Proc. ICML, 2019.