Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace (2403.04588v1)

Published 7 Mar 2024 in cs.AI

Abstract: Humans perceive the world through multiple senses, enabling them to create a comprehensive representation of their surroundings and to generalize information across domains. For instance, when a textual description of a scene is given, humans can mentally visualize it. In fields like robotics and Reinforcement Learning (RL), agents can also access information about the environment through multiple sensors; yet redundancy and complementarity between sensors is difficult to exploit as a source of robustness (e.g. against sensor failure) or generalization (e.g. transfer across domains). Prior research demonstrated that a robust and flexible multimodal representation can be efficiently constructed based on the cognitive science notion of a 'Global Workspace': a unique representation trained to combine information across modalities, and to broadcast its signal back to each modality. Here, we explore whether such a brain-inspired multimodal representation could be advantageous for RL agents. First, we train a 'Global Workspace' to exploit information collected about the environment via two input modalities (a visual input, or an attribute vector representing the state of the agent and/or its environment). Then, we train a RL agent policy using this frozen Global Workspace. In two distinct environments and tasks, our results reveal the model's ability to perform zero-shot cross-modal transfer between input modalities, i.e. to apply to image inputs a policy previously trained on attribute vectors (and vice-versa), without additional training or fine-tuning. Variants and ablations of the full Global Workspace (including a CLIP-like multimodal representation trained via contrastive learning) did not display the same generalization abilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Bernard J. Baars. A Cognitive Theory of Consciousness. Cambridge University Press, New York, 1988.
  2. A neuronal model of a global workspace in effortful cognitive tasks. Proceedings of the National Academy of Sciences, 95(24):14529–14534, November 1998. doi: 10.1073/pnas.95.24.14529. URL https://www.pnas.org/doi/full/10.1073/pnas.95.24.14529. Publisher: Proceedings of the National Academy of Sciences.
  3. Does language help generalization in vision models? In Arianna Bisazza and Omri Abend (eds.), Proceedings of the 25th Conference on Computational Natural Language Learning, pp.  171–182, Online, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.conll-1.13. URL https://aclanthology.org/2021.conll-1.13.
  4. Semi-supervised Multimodal Representation Learning through a Global Workspace, June 2023. URL http://arxiv.org/abs/2306.15711. arXiv:2306.15711 [cs, q-bio].
  5. Curious Representation Learning for Embodied Intelligence. pp.  10408–10417, 2021. URL https://openaccess.thecvf.com/content/ICCV2021/html/Du_Curious_Representation_Learning_for_Embodied_Intelligence_ICCV_2021_paper.html.
  6. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp.  512–519. IEEE, 2016.
  7. Learning Disentangled Discrete Representations. In Danai Koutra, Claudia Plant, Manuel Gomez Rodriguez, Elena Baralis, and Francesco Bonchi (eds.), Machine Learning and Knowledge Discovery in Databases: Research Track, Lecture Notes in Computer Science, pp.  593–609, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-43421-1. doi: 10.1007/978-3-031-43421-1_35.
  8. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning, March 2017. URL http://arxiv.org/abs/1703.02949. arXiv:1703.02949 [cs].
  9. World Models. March 2018. doi: 10.5281/zenodo.1207631. URL http://arxiv.org/abs/1803.10122. arXiv:1803.10122 [cs, stat].
  10. Mastering Diverse Domains through World Models. CoRR, abs/2301.04104, 2023. URL https://doi.org/10.48550/arXiv.2301.04104.
  11. DARLA: improving zero-shot transfer in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp.  1480–1490, Sydney, NSW, Australia, August 2017. JMLR.org.
  12. Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception. In Lluís Màrquez, Chris Callison-Burch, and Jian Su (eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.  2461–2470, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1293. URL https://aclanthology.org/D15-1293.
  13. CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, pp.  5639–5650. PMLR, November 2020. URL https://proceedings.mlr.press/v119/laskin20a.html. ISSN: 2640-3498.
  14. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks. In 2019 International Conference on Robotics and Automation (ICRA), pp.  8943–8950, May 2019. doi: 10.1109/ICRA.2019.8793485. URL https://ieeexplore.ieee.org/document/8793485. ISSN: 2577-087X.
  15. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of The 33rd International Conference on Machine Learning, pp.  1928–1937. PMLR, June 2016. URL https://proceedings.mlr.press/v48/mniha16.html. ISSN: 1938-7228.
  16. Found in Translation: Learning Robust Joint Representations by Cyclic Translations between Modalities. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6892–6899, July 2019. ISSN 2374-3468. doi: 10.1609/aaai.v33i01.33016892. URL https://ojs.aaai.org/index.php/AAAI/article/view/4666. Number: 01.
  17. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, pp.  8748–8763. PMLR, July 2021. URL https://proceedings.mlr.press/v139/radford21a.html. ISSN: 2640-3498.
  18. Proximal Policy Optimization Algorithms, August 2017. URL http://arxiv.org/abs/1707.06347. arXiv:1707.06347 [cs].
  19. Grounded Models of Semantic Representation. In Jun’ichi Tsujii, James Henderson, and Marius Paşca (eds.), Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp.  1423–1433, Jeju Island, Korea, July 2012. Association for Computational Linguistics. URL https://aclanthology.org/D12-1130.
  20. Playing Games in the Dark: An Approach for Cross-Modality Transfer in Reinforcement Learning. Proceed ings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp.  pp. 1260–1268, 2020.
  21. Scene Graph Contrastive Learning for Embodied Navigation. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  10850–10860, Paris, France, October 2023. IEEE. ISBN 9798350307184. doi: 10.1109/ICCV51070.2023.00999. URL https://ieeexplore.ieee.org/document/10377327/.
  22. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5):1054–1054, 1998. doi: 10.1109/TNN.1998.712192.
  23. Gymnasium, March 2023. URL https://zenodo.org/record/8127025.
  24. Deep learning and the Global Workspace Theory. Trends in Neurosciences, 44(9):692–704, September 2021. ISSN 0166-2236, 1878-108X. doi: 10.1016/j.tins.2021.04.005. URL https://www.cell.com/trends/neurosciences/abstract/S0166-2236(21)00077-1. Publisher: Elsevier.
  25. Disentangled Representation Learning for Recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):408–424, January 2023. ISSN 1939-3539. doi: 10.1109/TPAMI.2022.3153112. URL https://ieeexplore.ieee.org/abstract/document/9720218. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
  26. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/hash/a1afc58c6ca9540d057299ec3016d726-Abstract.html.
  27. Webots. http://www.cyberbotics.com. URL http://www.cyberbotics.com.
  28. Associate Latent Encodings in Learning from Demonstrations. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), February 2017. ISSN 2374-3468. doi: 10.1609/aaai.v31i1.11040. URL https://ojs.aaai.org/index.php/AAAI/article/view/11040. Number: 1.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Léopold Maytié (3 papers)
  2. Benjamin Devillers (3 papers)
  3. Alexandre Arnold (4 papers)
  4. Rufin VanRullen (32 papers)