Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploration and Anti-Exploration with Distributional Random Network Distillation (2401.09750v4)

Published 18 Jan 2024 in cs.LG

Abstract: Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https://github.com/yk7333/DRND.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732, 2017.
  2. Uncertainty-based offline reinforcement learning with diversified q-ensemble. Advances in neural information processing systems, 34:7436–7447, 2021.
  3. Minimax regret bounds for reinforcement learning. In International Conference on Machine Learning, pp.  263–272. PMLR, 2017.
  4. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  5. Jax: composable transformations of python+ numpy programs. 2018.
  6. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355, 2018a.
  7. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018b.
  8. Conservative uncertainty estimation by fitting prior networks. In International Conference on Learning Representations, 2019.
  9. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  10. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
  11. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pp.  2052–2062. PMLR, 2019.
  12. Why so pessimistic? estimating uncertainties for offline rl through ensembles, and why their independence matters. Advances in Neural Information Processing Systems, 35:18267–18281, 2022.
  13. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp.  1861–1870. PMLR, 2018.
  14. Confidence-conditioned value functions for offline reinforcement learning. arXiv preprint arXiv:2212.04607, 2022.
  15. Vime: Variational information maximizing exploration. Advances in neural information processing systems, 29, 2016.
  16. Multiplicative interactions and where to find them. 2020.
  17. Model-based offline reinforcement learning with count-based conservatism. 2023.
  18. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
  19. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  20. Showing your offline reinforcement learning work: Online evaluation budget matters. In International Conference on Machine Learning, pp.  11729–11752. PMLR, 2022.
  21. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In International Conference on Machine Learning, pp.  5556–5566. PMLR, 2020.
  22. Optimistic initialization for exploration in continuous control. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  7612–7619, 2022.
  23. Flipping coins to estimate pseudocounts for exploration in reinforcement learning. arXiv preprint arXiv:2306.03186, 2023.
  24. Double check your state before trusting it: Confidence-aware bidirectional offline model-based imagination. In Thirty-sixth Conference on Neural Information Processing Systems, 2022a.
  25. Mildly conservative q-learning for offline reinforcement learning. In Thirty-sixth Conference on Neural Information Processing Systems, 2022b.
  26. State advantage weighting for offline RL. In International Conference on Learning Representation tiny paper, 2023. URL https://openreview.net/forum?id=PjypHLTo29v.
  27. Count-based exploration with the successor representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  5125–5133, 2020.
  28. Count-based exploration in feature space for reinforcement learning. arXiv preprint arXiv:1706.08090, 2017.
  29. Anti-exploration by random network distillation. arXiv preprint arXiv:2301.13616, 2023.
  30. Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016.
  31. Count-based exploration with neural density models. In International conference on machine learning, pp.  2721–2730. PMLR, 2017.
  32. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  33. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp.  2778–2787. PMLR, 2017.
  34. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  35. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  36. Offline reinforcement learning as anti-exploration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8106–8114, 2022.
  37. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  38. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
  39. An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131:139–148, 2012.
  40. An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences, 74(8):1309–1331, 2008.
  41. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
  42. # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems, 30, 2017.
  43. Corl: Research-oriented deep offline reinforcement learning library. arXiv preprint arXiv:2210.07105, 2022.
  44. Revisiting the minimalist approach to offline reinforcement learning. arXiv preprint arXiv:2305.09836, 2023.
  45. Exponentially weighted imitation learning for batched historical data. Advances in Neural Information Processing Systems, 31, 2018.
  46. Critic regularized regression. Advances in Neural Information Processing Systems, 33:7768–7778, 2020.
  47. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  48. Bellman-consistent pessimism for offline reinforcement learning. Advances in neural information processing systems, 34:6683–6694, 2021.
  49. Rorl: Robust offline reinforcement learning via conservative smoothing. Advances in Neural Information Processing Systems, 35:23851–23866, 2022.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets