Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey Analyzing Generalization in Deep Reinforcement Learning (2401.02349v2)

Published 4 Jan 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Reinforcement learning research obtained significant success and attention with the utilization of deep neural networks to solve problems in high dimensional state or action spaces. While deep reinforcement learning policies are currently being deployed in many different fields from medical applications to LLMs, there are still ongoing questions the field is trying to answer on the generalization capabilities of deep reinforcement learning policies. In this paper, we will formalize and analyze generalization in deep reinforcement learning. We will explain the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities. Furthermore, we will categorize and explain the manifold solution approaches to increase generalization, and overcome overfitting in deep reinforcement learning policies. From exploration to adversarial analysis and from regularization to robustness our paper provides an analysis on a wide range of subfields within deep reinforcement learning with a broad scope and in-depth view. We believe our study can provide a compact guideline for the current advancements in deep reinforcement learning, and help to construct robust deep neural policies with higher generalization skills.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In International Conference on Learning Representations (ICLR), 2021a.
  2. Deep reinforcement learning at the edge of the statistical precipice. Conference on Neural Information Processing Systems (NeurIPS), 2021b.
  3. Discount factor as a regularizer in RL. In International Conference on Machine Learning (ICML), 2020.
  4. Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
  5. Leemon Baird. Residual algorithms: RL with function approximation. In International Conference on Machine Learning (ICML), 1995.
  6. Successor features for transfer in reinforcement learning. In Conference on Neural Information Processing Systems (NeurIPS), 2017.
  7. The arcade learning environment: An evaluation platform for general agents. JAIR, 2013.
  8. Unifying count-based exploration and intrinsic motivation. Conference on Neural Information Processing Systems (NeurIPS), 2016.
  9. Generalization in rl: Safely approximating the value function. In Conference on Neural Information Processing Systems (NeurIPS), 1994.
  10. Lifelong learning with a changing action set. In AAAI Conference on Artificial Intelligence, AAAI , 2020.
  11. Quantifying generalization in reinforcement learning. In International Conference on Machine Learning (ICML), 2019.
  12. Leveraging procedural generation to benchmark reinforcement learning. International Conference on MAchine Learning (ICML), 2020.
  13. Phasic policy gradient. In International Conference on Machine Learning (ICML), 2021.
  14. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930):47–53, 2022.
  15. Noisy networks for exploration. International Conference on Learning Representations (ICLR), 2018.
  16. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML), 2018.
  17. Transfer learning for related RL tasks via image-to-image translation. In International Conference on Machine Learning (ICML), 2019.
  18. A meta-mdp approach to exploration for lifelong reinforcement learning. In Conference on Neural Information Processing Systems (NeurIPS), 2019.
  19. Iq-learn: Inverse soft-q learning for imitation. Neural Information Processing Systems (NeurIPS) [Spotlight Presentation], 2021.
  20. Adversarial policies: Attacking deep RL. International Conference on Learning Representations (ICLR), 2020.
  21. Explaning and harnessing adversarial examples. International Conference on Learning Representations (ICLR), 2015.
  22. Combining q-learning and search with amortized value estimates. In 8th International Conference on Learning Representations, ICLR, 2020.
  23. Deep reinforcement learning with double q-learning. AAAI Conference on Artificial Intelligence, AAAI, 2016.
  24. VIME: variational information maximizing exploration. In Conference on Neural Information Processing Systems (NeurIPS), 2017.
  25. Robust deep reinforcement learning against adversarial perturbations on state observatons. Conference on Neural Information Processing Systems (NeurIPS), 2020.
  26. Adversarial attacks on neural network policies. International Conference on Learning Representations (ICLR), 2017.
  27. Generalization in reinforcement learning with selective noise injection and information bottleneck. Conference on Neural Information Processing Systems (NeurIPS), 2019.
  28. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
  29. Generalization to new actions in RL. In International Conference on Machine Learning (ICML), 2020.
  30. Sham Kakade. On the sample complexity of reinforcement learning. In PhD Thesis, 2003.
  31. Human-level atari 200x faster. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023.
  32. Introducing symmetries to black box meta reinforcement learning. In AAAI Conference on Artificial Intelligence, AAAI, 2022.
  33. Ezgi Korkmaz. Nesterov momentum adversarial perturbations in the deep reinforcement learning domain. International Conference on Machine Learning (ICML) Workshop., 2020.
  34. Ezgi Korkmaz. Adversarially trained neural policies in fourier domain. International Conference on Learning Representation (ICLR) Robust and Reliable Machine Learning in the Real World Workshop, 2021a.
  35. Ezgi Korkmaz. Adversarial training blocks generalization in neural policies. International Conference on Learning Representation (ICLR) Robust and Reliable Machine Learning in the Real World Workshop, 2021b.
  36. Ezgi Korkmaz. Inaccuracy of state-action value function for non-optimal actions in adversarially trained deep neural policies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021c.
  37. Ezgi Korkmaz. Non-robust feature mapping in deep reinforcement learning. International Conference on Machine Learning (ICML) Adversarial Machine Learning Workshop, 2021d.
  38. Ezgi Korkmaz. Investigating vulnerabilities of deep neural policies. Conference on Uncertainty in Artificial Intelligence (UAI), 2021e.
  39. Ezgi Korkmaz. Deep reinforcement learning policies learn shared adversarial features across mdps. AAAI Conference on Artificial Intelligence, AAAI, 2022.
  40. Ezgi Korkmaz. Adversarial robust deep reinforcement learning requires redefining robustness. AAAI Conference on Artificial Intelligence, AAAI, 2023.
  41. Detecting adversarial directions in deep reinforcement learning to make robust decisions. In International Conference on Machine Learning, ICML 2023, volume 202 of Proceedings of Machine Learning Research, pp. 17534–17543. PMLR, 2023.
  42. Delving into adversarial attacks on deep policies. International Conference on Learning Representations (ICLR), 2017.
  43. Metrics and continuity in reinforcement learning. In AAAI Conference on Artificial Intelligence, AAAI, 2021.
  44. Rl with augmented data. In Conference on Neural Information Processing Systems (NeurIPS), 2020a.
  45. CURL: contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning (ICML), 2020b.
  46. Lipschitz lifelong RL. AAAI Conference on Artificial Intelligence, AAAI, 2021.
  47. Network randomization: A simple technique for generalization in deep reinforcement learning. In International Conference on Learning Representations (ICLR), 2020.
  48. SUNRISE: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning (ICML), 2021.
  49. Tactics of adversarial attack on DRL agents. IJCAI, 2017.
  50. Regularization matters in policy optimization - an empirical study on continuous control. In International Conference on Learning Representations (ICLR), 2021.
  51. When is generalizable reinforcement learning tractable? In Conference on Neural Information Processing Systems (NeurIPS), 2021.
  52. Faster sorting algorithms discovered using deep reinforcement learning. Nature, 618(7964):257–263, 2023.
  53. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015.
  54. Algorithms for inverse reinforcement learning. In Pat Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, pp.  663–670, 2000.
  55. Discovering reinforcement learning algorithms. In Conference on Neural Information Processing Systems (NeurIPS), 2020.
  56. Deep exploration via bootstrapped DQN. Conference on Neural Information Processing Systems (NeurIPS), 2016a.
  57. Generalization and exploration via randomized value functions. In International Conference on Machine Learning (ICML), 2016b.
  58. Robust adversarial reinforcement learning. International Conference on Machine Learning (ICML), 2017.
  59. Decoupling value and policy for generalization in reinforcement learning. In International Conference on Machine Learning (ICML), 2021.
  60. Mastering atari, go, chess and shogi by planning with a learned model. Nat., 588(7839):604–609, 2020.
  61. Trust region policy optimization. CoRR, 2015.
  62. Mastering the game of go without human knowledge. Nature, 500:354–359, 2017.
  63. Intriguing properties of neural networks. International Conference on Learning Representations (ICLR), 2014.
  64. Cross-domain transfer for RL. In International Conference on Machine Learning (ICML), 2007.
  65. Issues in using function approximation for reinforcement learning. In Fourth Connectionist Models Summer School, 1993.
  66. Transfer of value functions via variational methods. Conference on Neural Information Processing Systems (NeurIPS), 2018.
  67. Hado van Hasselt. Double q-learning. In Conference on Neural Information Processing Systems (NeurIPS), 2010.
  68. Discovery of options via meta-learned subgoals. In Conference on Neural Information Processing Systems (NeurIPS), 2021.
  69. Leverage the average: an analysis of KL regularization in reinforcement learning. In Conference on Neural Information Processing Systems (NeurIPS), 2020a.
  70. Munchausen RL. In Conference on Neural Information Processing Systems (NeurIPS), 2020b.
  71. Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature, pp.  1–5, 2019.
  72. Improving generalization in RL with mixture regularization. In Conference on Neural Information Processing Systems (NeurIPS), 2020.
  73. Dueling network architectures for deep reinforcement learning. International Conference on Machine Learning (ICML), pp. 1995–2003, 2016.
  74. Chris Watkins. Learning from delayed rewards. In PhD thesis, Cambridge, 1989.
  75. Continual world: A robotic benchmark for continual reinforcement learning. Conference on Neural Information Processing Systems (NeurIPS), 2021.
  76. Meta-gradient reinforcement learning with an objective discovered online. In Conference on Neural Information Processing Systems (NeurIPS), 2020.
  77. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations (ICLR), 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.