Papers
Topics
Authors
Recent
2000 character limit reached

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets (2110.04698v2)

Published 10 Oct 2021 in cs.LG

Abstract: Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Joshua Achiam “Spinning Up in Deep Reinforcement Learning”, 2018
  2. Rishabh Agarwal, Dale Schuurmans and Mohammad Norouzi “An Optimistic Perspective on Offline Reinforcement Learning”, 2020 arXiv:1907.04543 [cs.LG]
  3. Oron Anschel, Nir Baram and Nahum Shimkin “Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning”, 2017 arXiv:1611.01929 [cs.AI]
  4. “OpenAI Gym”, 2016 arXiv:1606.01540 [cs.LG]
  5. “Decision Transformer: Reinforcement Learning via Sequence Modeling”, 2021 arXiv:2106.01345 [cs.LG]
  6. “BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning”, 2020 arXiv:1910.12179 [cs.LG]
  7. “Randomized Ensembled Double Q-Learning: Learning Fast Without a Model”, 2021 arXiv:2101.05982 [cs.LG]
  8. Po-Wei Chou, Daniel Maturana and Sebastian Scherer “Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution” In Proceedings of the 34th International Conference on Machine Learning 70, Proceedings of Machine Learning Research PMLR, 2017, pp. 834–843 URL: http://proceedings.mlr.press/v70/chou17a.html
  9. “OpenAI Baselines” In GitHub repository GitHub, https://github.com/openai/baselines, 2017
  10. “Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO”, 2020 arXiv:2005.12729 [cs.LG]
  11. Johan Ferret, Olivier Pietquin and Matthieu Geist “Self-Imitation Advantage Learning” In arXiv preprint arXiv:2012.11989, 2020
  12. “D4RL: Datasets for Deep Data-Driven Reinforcement Learning”, 2021 arXiv:2004.07219 [cs.LG]
  13. Scott Fujimoto, Herke Hoof and David Meger “Addressing function approximation error in actor-critic methods” In International Conference on Machine Learning, 2018, pp. 1587–1596 PMLR
  14. Tanmay Gangwani, Qiang Liu and Jian Peng “Learning Self-Imitating Diverse Policies”, 2019 arXiv:1805.10309 [stat.ML]
  15. Jake Grigsby “deep_control: Deep Reinforcement Learning for Continuous Control in PyTorch.” In GitHub repository GitHub, https://github.com/jakegrigsby/deep_control, 2020
  16. “RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning”, 2021 arXiv:2006.13888 [cs.LG]
  17. “On Calibration of Modern Neural Networks”, 2017 arXiv:1706.04599 [cs.LG]
  18. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”, 2018 arXiv:1801.01290 [cs.LG]
  19. “Learning values across many orders of magnitude”, 2016 arXiv:1602.07714 [cs.LG]
  20. “Deep Reinforcement Learning that Matters”, 2019 arXiv:1709.06560 [cs.LG]
  21. “Multi-task Deep Reinforcement Learning with PopArt”, 2018 arXiv:1809.04474 [cs.LG]
  22. “SciPy: Open source scientific tools for Python”
  23. Aviral Kumar, Abhishek Gupta and Sergey Levine “DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction”, 2020 arXiv:2003.07305 [cs.LG]
  24. “Conservative Q-Learning for Offline Reinforcement Learning”, 2020 arXiv:2006.04779 [cs.LG]
  25. “Stabilizing off-policy q-learning via bootstrapping error reduction” In arXiv preprint arXiv:1906.00949, 2019
  26. “Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics”, 2020 arXiv:2005.04269 [cs.LG]
  27. Michail G Lagoudakis and Ronald Parr “Reinforcement learning as classification: Leveraging modern classifiers” In Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 424–431
  28. “Maxmin Q-learning: Controlling the Estimation Bias of Q-learning”, 2020 arXiv:2002.06487 [cs.LG]
  29. “SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning”, 2020 arXiv:2007.04938 [cs.LG]
  30. Sergey Levine “Deep Reinforcement Learning in the Real World” Workshop on New Directions in Reinforcement Learning and Control, 2019 URL: https://www.youtube.com/watch?v=b97H5uz8xkI
  31. “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, 2020 arXiv:2005.01643 [cs.LG]
  32. “Continuous control with deep reinforcement learning”, 2015 arXiv:1509.02971 [cs.LG]
  33. “Accelerating Online Reinforcement Learning with Offline Datasets”, 2020 arXiv:2006.09359 [cs.LG]
  34. “Overcoming Exploration in Reinforcement Learning with Demonstrations”, 2018 arXiv:1709.10089 [cs.LG]
  35. “Self-Imitation Learning”, 2018 arXiv:1806.05635 [cs.LG]
  36. “An Algorithmic Perspective on Imitation Learning” In Foundations and Trends in Robotics 7.1-2 Now Publishers, 2018, pp. 1–179 DOI: 10.1561/2300000053
  37. “Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning”, 2019 arXiv:1910.00177 [cs.LG]
  38. “NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning”, 2021 arXiv:2102.00714 [cs.LG]
  39. Siddharth Reddy, Anca D. Dragan and Sergey Levine “SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards”, 2019 arXiv:1905.11108 [cs.LG]
  40. “In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning”, 2021 arXiv:2101.06329 [cs.LG]
  41. “ImageNet Large Scale Visual Recognition Challenge”, 2015 arXiv:1409.0575 [cs.CV]
  42. “Prioritized Experience Replay”, 2016 arXiv:1511.05952 [cs.LG]
  43. “Variational Imitation Learning with Diverse-quality Demonstrations” In Proceedings of the 37th International Conference on Machine Learning 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 9407–9417 URL: http://proceedings.mlr.press/v119/tangkaratt20a.html
  44. “Exponentially Weighted Imitation Learning for Batched Historical Data” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018 URL: https://proceedings.neurips.cc/paper/2018/file/4aec1b3435c52abbdf8334ea0e7141e0-Paper.pdf
  45. “Exponentially Weighted Imitation Learning for Batched Historical Data.”
  46. “Critic Regularized Regression”, 2020 arXiv:2006.15134 [cs.LG]
  47. “Uncertainty Weighted Offline Reinforcement Learning”, 2020 URL: https://offline-rl-neurips.github.io/pdf/27.pdf
  48. “Imitation learning from imperfect demonstration” In International Conference on Machine Learning, 2019, pp. 6818–6827 PMLR
  49. “Soft Actor-Critic (SAC) implementation in PyTorch” In GitHub repository GitHub, https://github.com/denisyarats/pytorch_sac, 2020
  50. “MOPO: Model-based Offline Policy Optimization”, 2020 arXiv:2005.13239 [cs.LG]
  51. “Offline Learning from Demonstrations and Unlabeled Experience”, 2020 arXiv:2011.13885 [cs.LG]
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.