Papers
Topics
Authors
Recent
Search
2000 character limit reached

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

Published 23 Dec 2023 in cs.LG, cs.AI, cs.CV, and cs.RO | (2312.15339v1)

Abstract: The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks with additional loss functions. We introduce MaDi, a novel algorithm that learns to mask distractions by the reward signal only. In MaDi, the conventional actor-critic structure of deep reinforcement learning agents is complemented by a small third sibling, the Masker. This lightweight neural network generates a mask to determine what the actor and critic will receive, such that they can focus on learning the task. The masks are created dynamically, depending on the current input. We run experiments on the DeepMind Control Generalization Benchmark, the Distracting Control Suite, and a real UR5 Robotic Arm. Our algorithm improves the agent's focus with useful masks, while its efficient Masker network only adds 0.2% more parameters to the original structure, in contrast to previous work. MaDi consistently achieves generalization results better than or competitive to state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning. In International Conference on Learning Representations. URL: https://arxiv.org/abs/2101.05265.
  2. Solving Rubik’s Cube with a Robot Hand. arXiv preprint arXiv:1910.07113 (2019). URL: https://openai.com/research/solving-rubiks-cube.
  3. Layer Normalization. Advances in Neural Information Processing Systems, Deep Learning Symposium (2016). URL: https://arxiv.org/abs/1607.06450.
  4. Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning. Advances in Neural Information Processing Systems 35 (2022), 30693–30706. URL: https://arxiv.org/abs/2209.09203.
  5. OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016). URL: https://www.gymlibrary.dev/.
  6. Quantifying Generalization in Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 1282–1289. URL: https://proceedings.mlr.press/v97/cobbe19a.html.
  7. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 7897 (2022), 414–419. URL: https://www.nature.com/articles/s41586-021-04301-9.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (2021). URL: https://arxiv.org/abs/2010.11929.
  9. RL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv preprint arXiv:1611.02779 (2016). URL: https://arxiv.org/abs/1611.02779.
  10. Open X-Embodiment: Robotic Learning Datasets and RT-X Models. URL: https://robotics-transformer-x.github.io.
  11. SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies. International Conference on Machine Learning (2021). URL: https://arxiv.org/abs/2106.09678.
  12. Generalization and Regularization in DQN. NeurIPS’18 Deep Reinforcement Learning Workshop (2018). URL: https://arxiv.org/abs/1810.00123.
  13. Bisimulation Metrics for Continuous Markov Decision Processes. SIAM J. Comput. 40, 6 (2011), 1662–1714. URL: https://doi.org/10.1137/10080484X.
  14. Deep Spatial Autoencoders for Visuomotor Learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 512–519. URL: https://arxiv.org/abs/1509.06113.
  15. The State of Sparse Training in Deep Reinforcement Learning. In International Conference on Machine Learning. PMLR, 7766–7792. URL: https://arxiv.org/abs/2206.10369.
  16. Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning. The 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2023). URL: https://arxiv.org/abs/2302.06548.
  17. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning. arXiv preprint arXiv:2304.13653 (2023). URL: https://sites.google.com/view/op3-soccer.
  18. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning. PMLR, 1861–1870. URL: https://arxiv.org/abs/1801.01290.
  19. Dream to Control: Learning Behaviors by Latent Imagination. International Conference on Learning Representations (2020). URL: https://arxiv.org/abs/1912.01603.
  20. Self-Supervised Policy Adaptation during Deployment. International Conference on Learning Representations (2020). URL: https://arxiv.org/abs/2007.04309.
  21. Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation. 35th Conference on Neural Information Processing Systems (2021). URL: https://arxiv.org/abs/2107.00644.
  22. Nicklas Hansen and Xiaolong Wang. 2021. Generalization in Reinforcement Learning by Soft Data Augmentation. In International Conference on Robotics and Automation. URL: https://arxiv.org/abs/2011.13389.
  23. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. URL: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf.
  24. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian Error Linear Units (GELUs). arXiv preprint arXiv:1606.08415 (2016). URL: https://arxiv.org/abs/1606.08415.
  25. Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 1-2 (1998), 99–134. URL: https://www.sciencedirect.com/science/article/pii/S000437029800023X.
  26. Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations (2015). URL: https://arxiv.org/abs/1412.6980.
  27. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning. Journal of Artificial Intelligence Research 76 (2023), 201–264. URL: https://arxiv.org/abs/2111.09794.
  28. Reinforcement Learning with Augmented Data. Advances in Neural Information Processing Systems 33 (2020), 19884–19895. URL: https://arxiv.org/abs/2004.14990.
  29. CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In International Conference on Machine Learning. PMLR, 5639–5650. URL: https://arxiv.org/abs/2004.04136.
  30. Benchmarking Reinforcement Learning Algorithms on Real-World Robots. In Conference on robot learning. PMLR, 561–591. URL: https://arxiv.org/abs/1809.07731.
  31. Learning to Navigate in Complex Environments. arXiv preprint arXiv:1611.03673 (2016). URL: https://arxiv.org/abs/1611.03673.
  32. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533. URL: https://www.nature.com/articles/nature14236.
  33. Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy. IEEE Transactions on Cognitive and Developmental Systems 13, 4 (2020), 806–817. URL: https://arxiv.org/abs/1911.03849.
  34. Automatic Data Augmentation for Generalization in Reinforcement Learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 5402–5415. URL: https://proceedings.neurips.cc/paper/2021/hash/2b38c2df6a49b97f706ec9148ce48d86-Abstract.html.
  35. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241. URL: https://arxiv.org/abs/1505.04597.
  36. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y URL: https://link.springer.com/article/10.1007/s11263-015-0816-y.
  37. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609. URL: https://www.nature.com/articles/s41586-020-03051-4.
  38. Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347 (2017). URL: https://arxiv.org/abs/1707.06347.
  39. The Distracting Control Suite – A Challenging Benchmark for Reinforcement Learning from Pixels. arXiv preprint arXiv:2101.02722 (2021). URL: https://arxiv.org/abs/2101.02722.
  40. Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels. arXiv preprint arXiv:2204.04905 (2022). URL: https://arxiv.org/abs/2204.04905.
  41. Deepmind Control Suite. arXiv preprint arXiv:1801.00690 (2018). URL: https://arxiv.org/abs/1801.00690.
  42. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 23–30. URL: https://arxiv.org/abs/1703.06907.
  43. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026–5033. URL: https://mujoco.org/.
  44. Ignorance is Bliss: Robust Control via Information Gating. arXiv preprint arXiv:2303.06121 (2023). URL: https://arxiv.org/abs/2303.06121.
  45. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (2017). URL: https://arxiv.org/abs/1706.03762.
  46. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763 (2016). URL: https://arxiv.org/abs/1611.05763.
  47. Improving Generalization in Reinforcement Learning with Mixture Regularization. Advances in Neural Information Processing Systems 33 (2020), 7968–7978. URL: https://arxiv.org/abs/2010.10814.
  48. Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9435–9441. URL: https://arxiv.org/abs/2210.02317.
  49. Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10452–10459. URL: https://arxiv.org/abs/2102.05714.
  50. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels. International Conference on Learning Representations (2021). URL: https://openreview.net/forum?id=GY6-6sTvGaf.
  51. Mask-based Latent Reconstruction for Reinforcement Learning. Advances in Neural Information Processing Systems 35 (2022), 25117–25131. URL: https://arxiv.org/abs/2201.12096.
  52. Yufeng Yuan and A Rupam Mahmood. 2022. Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 5546–5552. URL: https://arxiv.org/abs/2203.12759.
  53. Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning. International Joint Conference on Artificial Intelligence (2022). URL: https://arxiv.org/abs/2202.09982.
  54. Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning. Advances in Neural Information Processing Systems 35 (2022), 13022–13037. URL: https://arxiv.org/abs/2212.08860.
  55. Augmentation Curriculum Learning For Generalization in RL. (2023). URL: https://openreview.net/forum?id=Fj1S0SV8p3U.
  56. Learning Invariant Representations for Reinforcement Learning without Reconstruction. International Conference on Learning Representations (ICLR) (2021). URL: https://arxiv.org/abs/2006.10742.
  57. A Study on Overfitting in Deep Reinforcement Learning. arXiv preprint arXiv:1804.06893 (2018). URL: https://arxiv.org/abs/1804.06893.
  58. Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations. Advances in Neural Information Processing Systems 33 (2020), 21024–21037. URL: https://arxiv.org/abs/2003.08938.
  59. Places: A 10 million Image Database for Scene Recognition. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452–1464. URL: https://ieeexplore.ieee.org/document/7968387.
  60. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. International Conference on Robotics and Automation (2017). URL: https://ieeexplore.ieee.org/document/7989381.
Citations (5)

Summary

  • The paper introduces a novel Masker network that dynamically learns to mask irrelevant visual cues, improving RL generalization in visually distracting environments.
  • It augments traditional actor-critic models with only 0.2% extra parameters, achieving competitive performance with minimal computational overhead.
  • Experimental results across benchmarks, including the UR5 Robotic Arm, validate MaDi's effectiveness in boosting agent performance under challenging visual conditions.

Insights from "MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning"

The paper "MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning" introduces a novel approach to enhancing the generalization capabilities of reinforcement learning (RL) agents operating in visually distracting environments. The authors propose an algorithm, MaDi, that supplements traditional actor-critic architectures with a lightweight masking mechanism aimed at filtering out irrelevant visual information that can obscure task-relevant inputs.

One of the primary challenges addressed in this paper is the limited ability of RL agents to generalize across environments with varying visual characteristics. While traditional methods such as data augmentation and auxiliary networks have been employed to tackle this issue, they often demand substantial computational resources and introduce non-trivial overhead in terms of model complexity. MaDi seeks to improve upon these approaches by introducing a third component—the Masker network—that applies minimal additional parameters while significantly aiding the actor and critic networks in focusing on relevant features.

The key contribution of MaDi lies in its simplicity and efficacy. The Masker network is designed to learn from the environment's reward signal rather than external annotations or auxiliary loss functions. Through the critic’s loss function, the Masker learns to dynamically generate masks that highlight task-relevant visual cues while dimming distractions. By processing each frame individually, MaDi ensures that the agent receives optimized input for each state observation. The effectiveness of these masks is demonstrated across a range of tasks and environments from the DeepMind Control Generalization Benchmark, Distracting Control Suite, and a newly designed robotic setting involving a UR5 Robotic Arm—underscoring the generalization prowess of MaDi.

The experimental results provided in the paper underscore MaDi's competitive performance when evaluated against state-of-the-art methods. On the video_easy and video_hard settings, MaDi consistently achieves superior or equally strong outcomes with minimal computational overhead, attributable to the concise architecture of the Masker network that only adds 0.2% more parameters to the model. Additionally, MaDi's applicability is validated in real-world scenarios with the UR5 Robotic Arm experiment, where it maintains robust performance even in visually cluttered environments—the results are quantitatively supported by significant improvements in agent performance metrics.

Future avenues of exploration for the MaDi framework could include its integration with other advanced neural architectures, such as Vision Transformers (ViT), as there is a notable interest in understanding how these architectures perform in reinforcement learning contexts. Furthermore, the use of MaDi in transfer learning scenarios presents an intriguing prospect for leveraging its ability to fine-tune focus in environments requiring differentiated and adaptable learning processes.

In conclusion, "MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning" provides a technically significant contribution to the domain of vision-based deep RL, delivering a method that strikes a balance between computational efficiency and effective environment generalization. The strategic integration of the Masker network into deep RL pipelines represents a promising direction for further research into the optimization of decision-making processes within visually complex environments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 8 likes about this paper.