Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras (2404.14064v2)

Published 22 Apr 2024 in cs.LG and cs.CV

Abstract: The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that is robust to a reduction in the number of cameras to generalise to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from multiple cameras, with a shared representation that is aligned across all cameras to allow generalisation to a single camera, and a private representation that is camera-specific. We show experimentally that an RL agent trained on a single third-person camera is unable to learn an optimal policy in many control tasks; but, our approach, benefiting from multiple cameras during training, is able to solve the task using only the same single third-person camera.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Visual-policy learning through multi-camera view to single-camera view knowledge distillation for robot manipulation tasks. IEEE Robotics and Automation Letters, 9:691–698, 2023.
  2. Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In International Conference on Learning Representations, 2021.
  3. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pp. 1247–1255. PMLR, 2013.
  4. Learning representations by maximizing mutual information across views. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
  5. An actor-critic-attention mechanism for deep reinforcement learning in multi-view environments. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), 2019.
  6. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  1597–1607. PMLR, 2020.
  7. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2019.
  8. Conditional mutual information for disentangled representations in reinforcement learning. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023a.
  9. Temporal disentanglement of representations for improved generalisation in reinforcement learning. In International Conference on Learning Representations, 2023b.
  10. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia, MULTIMEDIA ’14, pp.  7–16. Association for Computing Machinery, 2014.
  11. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NeurIPS 2013), 2013.
  12. panda-gym: Open-source goal-conditioned environments for robotic learning. 4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS, 2021.
  13. Multimodal masked autoencoders learn transferable representations. ArXiv, abs/2205.14204, 2022.
  14. Image-to-image translation for cross-domain disentanglement. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018.
  15. Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), volume 80 of Proceedings of Machine Learning Research, pp.  1861–1870. PMLR, 2018.
  16. Generalization in reinforcement learning by soft data augmentation. In International Conference on Robotics and Automation, 2021.
  17. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  9726–9735. IEEE Computer Society, 2020.
  18. DARLA: Improving zero-shot transfer in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), volume 70 of Proceedings of Machine Learning Research, pp.  1480–1490. PMLR, 2017.
  19. A benchmark for interpretability methods in deep neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS 2018), 2018.
  20. Haruo Hosoya. Group-based learning of disentangled representations with generalizability for novel contents. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp.  2506–2513. International Joint Conferences on Artificial Intelligence Organization, 2019.
  21. Learning to decompose and disentangle representations for video prediction. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018.
  22. Vision-based manipulators need to also see from their hands. In International Conference on Learning Representations (ICLR 2022), 2022.
  23. Self-supervised multi-view disentanglement for expansion of visual collections. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM ’23, pp.  841–849. Association for Computing Machinery, 2023.
  24. Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation. IEEE Robotics and Automation Letters, 7:3046–3053, 2022.
  25. Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:664–676, 2014.
  26. Conan: Contrastive fusion networks for multi-view clustering. In 2021 IEEE International Conference on Big Data (Big Data), pp.  653–660, 2021.
  27. Disentangling multi-view representations be- yond inductive bias. In Proceedings of the 31st ACM International Conference on Multimedia (MM ’23), 2023.
  28. Captum: A unified and generic model interpretability library for pytorch. arXiv:2009.07896, 2020.
  29. CURL: Contrastive unsupervised representations for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  5639–5650. PMLR, 2020.
  30. Predictive information accelerates learning in rl. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.
  31. Multimedia content processing through cross-modal association. In Proceedings of the Eleventh ACM International Conference on Multimedia, MULTIMEDIA ’03, pp.  604–611. Association for Computing Machinery, 2003. ISBN 1581137222.
  32. A survey of multi-view representation learning. IEEE Transactions on Knowledge & Data Engineering, 31:1863–1883, 2019.
  33. Deep reinforcement and infomax learning. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.
  34. Multi-horizon representations with hierarchical forward models for reinforcement learning. Transactions on Machine Learning Research (TMLR), 2024.
  35. Interpretable representation learning from temporal multi-view data. In Proceedings of The 14th Asian Conference on Machine Learning, volume 189 of Proceedings of Machine Learning Research, pp.  864–879. PMLR, 2023.
  36. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), volume 70 of Proceedings of Machine Learning Research, pp.  3319–3328. PMLR, 2017.
  37. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  5026–5033, 2012.
  38. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
  39. Unsupervised feature learning via non-parametric instance discrimination. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3733–3742, 2018.
  40. Joint deep multi-view learning for image clustering. IEEE Transactions on Knowledge and Data Engineering, 33(11):3594–3606, 2021.
  41. Self-supervised deep correlational multi-view clustering. In 2021 International Joint Conference on Neural Networks (IJCNN), pp.  1–8, 2021.
  42. Multi-vae: Learning disentangled view-common and view-peculiar visual representations for multi-view clustering. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  9214–9223, 2021.
  43. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2021a.
  44. Improving sample efficiency in model-free reinforcement learning from images. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021b.
  45. Learning multiple views with orthogonal denoising autoencoders. In Conference on Multimedia Modeling, 2016.
  46. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, pp.  1094–1100. PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Mhairi Dunion (5 papers)
  2. Stefano V. Albrecht (73 papers)
Citations (2)