Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning (2401.02325v2)

Published 4 Jan 2024 in cs.LG and stat.ML

Abstract: Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal. This paper introduces a generalized quantile Huber loss function derived from Wasserstein distance (WD) calculation between Gaussian distributions, capturing noise in predicted (current) and target (BeLLMan-updated) quantile values. Compared to the classical quantile Huber loss, this innovative loss function enhances robustness against outliers. Notably, the classical Huber loss function can be seen as an approximation of our proposed loss, enabling parameter adjustment by approximating the amount of noise in the data during the learning process. Empirical tests on Atari games, a common application in distributional RL, and a recent hedging strategy using distributional RL, validate the effectiveness of our proposed loss function and its potential for parameter adjustments in distributional RL. The implementation of the proposed loss function is available here.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “Uncertainty-aware transfer across tasks using hybrid model-based successor feature reinforcement learning,” Neurocomputing, vol. 530, pp. 165–187, 2023.
  2. “A distributional perspective on reinforcement learning,” in International conference on machine learning. PMLR, 2017, pp. 449–458.
  3. “Distributional reinforcement learning with quantile regression,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32.
  4. “Robust reinforcement learning with distributional risk-averse formulation,” arXiv preprint arXiv:2206.06841, 2022.
  5. “A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  6. “Gamma and vega hedging using deep distributional reinforcement learning,” Frontiers in Artificial Intelligence, vol. 6, pp. 1129370, 2023.
  7. “Implicit quantile networks for distributional reinforcement learning,” in International conference on machine learning. PMLR, 2018, pp. 1096–1105.
  8. “Distributed distributional deterministic policy gradients,” arXiv preprint arXiv:1804.08617, 2018.
  9. P. J. Huber, “Robust estimation of a location parameter,” in Breakthroughs in statistics: Methodology and distribution, pp. 492–518. Springer, 1992.
  10. “Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity,” Statistical Papers, pp. 1–28, 2022.
  11. “Fully parameterized quantile function for distributional reinforcement learning,” Advances in neural information processing systems, vol. 32, 2019.
  12. “Generalized huber loss for robust learning and its efficient minimization for a robust statistics,” arXiv preprint arXiv:2108.12627, 2021.
  13. R. J. Taggart, “Point forecasting and forecast evaluation with generalized huber loss,” Electronic Journal of Statistics, vol. 16, no. 1, pp. 201–231, 2022.
  14. R. Taggart, “Evaluation of point forecasts for extreme events using consistent scoring functions,” Quarterly Journal of the Royal Meteorological Society, vol. 148, no. 742, pp. 306–320, 2022.
  15. “Deep huber quantile regression networks,” arXiv preprint arXiv:2306.10306, 2023.
  16. “Robust losses for learning value functions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6157–6167, 2022.
  17. “Robust forward algorithms via pac-bayes and laplace distributions,” in Artificial Intelligence and Statistics. PMLR, 2014, pp. 678–686.
  18. G. P. Meyer, “An alternative probabilistic interpretation of the huber loss,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2021, pp. 5261–5269.
  19. “Distributional deep reinforcement learning with a mixture of gaussians,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 9791–9797.
  20. “Gmac: A distributional perspective on actor-critic framework,” in International Conference on Machine Learning. PMLR, 2021, pp. 7927–7936.
  21. “Toward risk-based optimistic exploration for cooperative multi-agent reinforcement learning,” arXiv preprint arXiv:2303.01768, 2023.
  22. “Popo: Pessimistic offline policy optimization,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4008–4012.
  23. C. Villani et al., Optimal transport: old and new, vol. 338, Springer, 2009.
  24. “On the 1-wasserstein distance between location-scale distributions and the effect of differential privacy,” arXiv preprint arXiv:2304.14869, 2023.
  25. “Managing smile risk,” Wilmott Magazine, vol. September, pp. 84–108, 2002.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Parvin Malekzadeh (8 papers)
  2. Konstantinos N. Plataniotis (109 papers)
  3. Zissis Poulos (9 papers)
  4. Zeyu Wang (137 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com