CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing (2304.13616v2)
Abstract: The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.
- Contrastive behavioral similarity embeddings for generalization in reinforcement learning. arXiv preprint arXiv:2101.05265, 2021.
- Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
- Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
- Pattern recognition and machine learning, volume 4. Springer, 2006.
- Openai gym, 2016.
- Cross-age reference coding for age-invariant face recognition and retrieval. In European conference on computer vision, pages 768–783. Springer, 2014.
- Gridmask data augmentation. arXiv preprint arXiv:2001.04086, 2020.
- Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view. In 2019 International Conference on Robotics and Automation (ICRA), pages 5993–6000. IEEE, 2019.
- Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pages 2048–2056. PMLR, 2020.
- Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability. Advances in Neural Information Processing Systems, 34:25502–25515, 2021.
- Adversarial reinforcement learning for procedural content generation. In 2021 IEEE Conference on Games (CoG), pages 1–8. IEEE, 2021.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- Reinforcement learning algorithm for partially observable markov decision problems. Advances in neural information processing systems, 7, 1994.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- On overfitting, generalization, and randomly expanded training sets. IEEE Transactions on Neural Networks, 11(5):1050–1057, 2000.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020.
- Ai safety gridworlds. arXiv preprint arXiv:1711.09883, 2017.
- Shifts: A dataset of real distributional shift across multiple large-scale tasks. arXiv preprint arXiv:2107.07455, 2021.
- Cross-trajectory representation learning for zero-shot generalization in rl. arXiv preprint arXiv:2106.02193, 2021.
- Asynchronous Methods for Deep Reinforcement Learning. In International Conference on Machine Learning, 2016.
- Reinforcement learning with human advice: a survey. Frontiers in Robotics and AI, 8:584075, 2021.
- A review of novelty detection. Signal processing, 99:215–249, 2014.
- Robust adversarial reinforcement learning. In International Conference on Machine Learning, pages 2817–2826. PMLR, 2017.
- Martin L Puterman. Markov decision processes. Handbooks in operations research and management science, 2:331–434, 1990.
- Dataset shift in machine learning. Mit Press, 2008.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
- Automatic data augmentation for generalization in deep reinforcement learning. arXiv preprint arXiv:2006.12862, 2020.
- Real-time drift detection on time-series data. arXiv preprint arXiv:2110.06383, 2021.
- Early stopping and non-parametric regression: an optimal data-dependent stopping rule. The Journal of Machine Learning Research, 15(1):335–366, 2014.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
- Matthijs TJ Spaan. Partially observable markov decision processes. In Reinforcement Learning, pages 387–414. Springer, 2012.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- Reinforcement learning: An introduction. MIT press, 2018.
- Policy Gradient Methods for Reinforcement Learning with Function Approximation. In S. Solla, T. Leen, and K. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 1057–1063. MIT Press, 2000.
- An effective baseline for robustness to distributional shift. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 278–285. IEEE, 2021.
- Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
- On the computational complexity of stochastic controller optimization in pomdps. ACM Transactions on Computation Theory (TOCT), 4(4):1–8, 2012.
- Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168(2):022022, 2019.
- Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
- Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pages 737–744. IEEE, 2020.
- Philipp Altmann (32 papers)
- Fabian Ritz (18 papers)
- Leonard Feuchtinger (1 paper)
- Jonas Nüßlein (33 papers)
- Claudia Linnhoff-Popien (105 papers)
- Thomy Phan (29 papers)