Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization (2312.05787v1)

Published 10 Dec 2023 in cs.LG

Abstract: Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method with a high RR and regularization. To apply REDQ to sparse-reward goal-conditioned tasks, we make the following modifications to it: (i) using hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics (Plappert et al., 2018), and show that it achieves about $2 \times$ better sample efficiency than previous state-of-the-art (SoTA) RL methods. Furthermore, we reconsider the necessity of specific components of REDQ and simplify it by removing unnecessary ones. The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Deep reinforcement learning at the edge of the statistical precipice. In Proc. NeurIPS, 2021.
  2. Pulkit Agrawal. The task specification problem. In Proc. CoRL, 2022.
  3. Hindsight experience replay. In Proc. NeurIPS, 2017.
  4. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
  5. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  6. Efficient online reinforcement learning with offline data. arXiv preprint arXiv:2302.02948, 2023.
  7. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
  8. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
  9. The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications. In Proc. AAAI, 2023.
  10. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
  11. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  12. Randomized ensembled double Q-learning: Learning fast without a model. In Proc. ICLR, 2021.
  13. Diversity-based trajectory and goal selection with hindsight experience replay. In Proc. PRICAI, 2021.
  14. Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/Gymnasium-Robotics.
  15. Sample-efficient reinforcement learning by breaking the replay ratio barrier. In Proc. ICLR, 2023.
  16. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
  17. Curriculum-guided hindsight experience replay. In Proc. NeurIPS, 2019.
  18. D4RL: datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  19. Addressing function approximation error in actor-critic methods. In Proc. ICML, 2018.
  20. For sale: State-action representation learning for deep reinforcement learning. arXiv preprint arXiv:2306.02451, 2023.
  21. Distributed reinforcement learning of targeted grasping with active vision for mobile manipulators. In Proc. IROS, 2020.
  22. Co-adaptation of algorithmic and implementational innovations in inference-based deep reinforcement learning. In Proc. NeurIPS, 2021.
  23. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. ICML, 2018.
  24. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. arXiv preprint arXiv:2304.13653, 2023.
  25. Dropout Q-functions for doubly efficient reinforcement learning. In Proc. ICLR, 2022.
  26. Qgraph-bounded Q-learning: Stabilizing model-free off-policy deep reinforcement learning. arXiv preprint arXiv:2007.07582, 2020.
  27. When to trust your model: Model-based policy optimization. In Proc. NeurIPS, 2019.
  28. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
  29. Adam: A method for stochastic optimization. 2015.
  30. Reward (mis) design for autonomous driving. Artificial Intelligence, 316:103829, 2023.
  31. Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In Proc. ICLR, 2021.
  32. PLASTIC: Improving input and label plasticity for sample efficient reinforcement learning. In Proc. NeurIPS, 2023.
  33. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020.
  34. Efficient deep reinforcement learning requires regulating overfitting. In Proc. ICLR, 2023a.
  35. Accelerating exploration with unlabeled prior data. arXiv preprint arXiv:2311.05067, 2023b.
  36. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  37. Episodic memory deep q-networks. In Proc. IJCAI, 2018.
  38. Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299, 2022.
  39. Relay hindsight experience replay: Continual reinforcement learning for robot manipulation tasks with sparse rewards. arXiv preprint arXiv:2208.00843, 2022.
  40. Guided meta-policy search. In Proc. NeurIPS, pp.  9653–9664, 2019.
  41. Cal-QL: Calibrated offline RL pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479, 2023.
  42. The primacy bias in deep reinforcement learning. In Proc. ICML, 2022.
  43. Self-imitation learning. In Proc. ICML, 2018.
  44. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.
  45. Curiosity-driven exploration by self-supervised prediction. In Proc. ICML, 2017.
  46. Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In Proc. ICML, 2020.
  47. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  48. Exploration via hindsight goal generation. In Proc. NeurIPS, 2019.
  49. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  50. Bigger, better, faster: Human-level Atari with human-level efficiency. In Proc. ICML, 2023.
  51. Self-improving robots: End-to-end autonomous visuomotor reinforcement learning. arXiv preprint arXiv:2303.01488, 2023.
  52. Learning to play in a day: Faster deep reinforcement learning by optimality tightening. In Proc. ICLR, 2017.
  53. A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860, 2022.
  54. Grow your limits: Continuous improvement with real-world rl for robotic locomotion. arXiv preprint arXiv:2310.17634, 2023a.
  55. Learning and adapting agile locomotion skills by transferring experience. arXiv preprint arXiv:2304.09834, 2023b.
  56. The dormant neuron phenomenon in deep reinforcement learning. arXiv preprint arXiv:2302.12902, 2023.
  57. FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing. arXiv pre-print arXiv:2304.09831, 2023.
  58. Reinforcement learning: An introduction. MIT press, 2018.
  59. # exploration: A study of count-based exploration for deep reinforcement learning. In Proc. NeurIPS, 2017.
  60. Yunhao Tang. Self-imitation learning via generalized lower bound Q-learning. In Proc. NeurIPS, 2020.
  61. MuJoCo: A physics engine for model-based control. In Proc. IROS, pp.  5026–5033. IEEE, 2012.
  62. Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. In Proc. NeurIPS, 2019.
  63. Deep reinforcement learning and the deadly triad. arXiv preprint arXiv:1812.02648, 2018.
  64. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
  65. A survey of multi-task deep reinforcement learning. Electronics, 9(9):1363, 2020.
  66. Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
  67. Efficient multi-goal reinforcement learning via value consistency prioritization. Journal of Artificial Intelligence Research, 77:355–376, 2023.
  68. Language to rewards for robotic skill synthesis. Arxiv preprint arXiv:2306.08647, 2023.
  69. Le Zhao and Wei Xu. Faster reinforcement learning with value target lower bounding, 2023. URL https://openreview.net/forum?id=WWYHBZ1wWzp.
  70. Energy-based hindsight experience prioritization. In Proc. CoRL, 2018.
  71. Curiosity-driven experience prioritization via density estimation. arXiv preprint arXiv:1902.08039, 2019.
  72. Maximum entropy-regularized multi-goal reinforcement learning. In Proc. ICML, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.