Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression (2305.16877v4)

Published 26 May 2023 in cs.LG and cs.AI

Abstract: Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts rich feedback from environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional BeLLMan operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 34, 2021.
  2. Atari-5: Distilling the arcade learning environment down to five games. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  421–438. PMLR, 23–29 Jul 2023.
  3. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013.
  4. A distributional perspective on reinforcement learning. In ICML, pp.  449–458. PMLR, 2017.
  5. Distributional Reinforcement Learning. MIT Press, 2023.
  6. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  7. Implicit quantile networks for distributional reinforcement learning. In ICML, pp.  1096–1105. PMLR, 2018a.
  8. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018b.
  9. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018.
  10. Expectile asymptotics. Electronic Journal of Statistics, 10(2):2355 – 2371, 2016.
  11. CleanRL: High-quality single-file implementations of deep reinforcement learning algorithms. JMLR, 23(274):1–18, 2022.
  12. A simulation environment and reinforcement learning method for waste reduction. TMLR, 2023.
  13. Offline reinforcement learning with implicit q-learning. In ICLR, 2022.
  14. Nikolai Luzin. The Integral and Trigonometric Series (Russian). PhD thesis, Moscow State University, 1915.
  15. Revisiting the Arcade Learning Environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562, 2018.
  16. Stochastically dominant distributional reinforcement learning. In ICML, volume 119, pp.  6745–6754. PMLR, 13–18 Jul 2020.
  17. Distributional reinforcement learning for efficient exploration. In ICML, pp.  4424–4434. PMLR, 2019.
  18. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  19. Asymmetric least squares estimation and testing. Econometrica, 55(4):819–847, 1987.
  20. Colin Philipps. When is an expectile the best linear unbiased estimator? SSRN, 2021a.
  21. Collin Philipps. Interpreting expectiles. SSRN, 2021b.
  22. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
  23. Statistics and samples in distributional reinforcement learning. In ICML, pp.  5528–5536. PMLR, 2019.
  24. An analysis of quantile temporal-difference learning. arXiv preprint arXiv:2301.04462, 2023.
  25. Malcolm Strens. A Bayesian framework for reinforcement learning. In ICML, volume 2000, pp.  943–950, 2000.
  26. Reinforcement Learning: An Introduction. MIT press, 2018.
  27. Bayesian reinforcement learning. Reinforcement Learning: State-of-the-Art, pp.  359–386, 2012.
  28. Expectile and quantile regression—david and goliath? Statistical Modelling, 15(5):433–456, 2015.
  29. Fully parameterized quantile function for distributional reinforcement learning. NeurIPS, 32, 2019.
  30. Asymmetric least squares regression estimation: A nonparametric approach. Journal of Nonparametric Statistics, 6(2-3):273–292, 1996.
  31. Moisej A. Zaretsky. On one theorem on absolutely continuous functions (russian),. Doklady Rossiiskoi Akademii Nauk, 1925.

Summary

We haven't generated a summary for this paper yet.