GLIDE-RL: Grounded Language Instruction through DEmonstration in RL (2401.02991v1)
Abstract: One of the final frontiers in the development of complex human - AI collaborative systems is the ability of AI agents to comprehend the natural language and perform tasks accordingly. However, training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge due to the complexity and ambiguity of the language and sparsity of the rewards, among other factors. Several advances in reinforcement learning, curriculum learning, continual learning, LLMs have independently contributed to effective training of grounded agents in various environments. Leveraging these developments, we present a novel algorithm, Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) that introduces a teacher-instructor-student curriculum learning framework for training an RL agent capable of following natural language instructions that can generalize to previously unseen language instructions. In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student's current skill level. We further demonstrate the necessity for training the student agent with not just one, but multiple teacher agents. Experiments on a complex sparse reward environment validates the effectiveness of our proposed approach.
- Refer360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT: A referring expression recognition dataset in 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT images. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7189–7202, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.644. URL https://aclanthology.org/2020.acl-main.644.
- Universal value function approximators. In International conference on machine learning, pages 1312–1320. PMLR, 2015.
- Goal-conditioned reinforcement learning: Problems and solutions. In Luc De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 5502–5511, 2022.
- Curriculum learning for reinforcement learning domains: A framework and survey. Journal of Machine Learning Research, 21(181):1–50, 2020. URL http://jmlr.org/papers/v21/20-212.html.
- Mammut: A simple architecture for joint learning for multimodal tasks, 2023.
- mplug-2: A modularized multi-modal foundation model across text, image and video, 2023.
- Voyager: An open-ended embodied agent with large language models, 2023.
- Guiding pretraining in reinforcement learning with large language models, 2023.
- Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research, 74:1159–1199, 2022.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pages 1515–1528. PMLR, 2018.
- Learning with amigo: Adversarially motivated intrinsic goals. ICLR, 2021.
- Intrinsic motivation and automatic curricula via asymmetric self-play. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
- Asymmetric self-play for automatic goal discovery in robotic manipulation, 2021.
- It takes four to tango: Multiagent selfplay for automatic curriculum generation. In 10th International Conference on Learning Representations, ICLR 2022, pages 1515–1528, 2022.
- Do as you teach: A multi-teacher approach to self-play in deep reinforcement learning. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 2457–2459, 2023.
- A survey of reinforcement learning informed by natural language. In International Joint Conference on Artificial Intelligence (IJCAI-19), 2019.
- Learning to understand goal specifications by modelling reward. arXiv preprint arXiv:1806.01946, 2018.
- Grounded language learning in a simulated 3d world. arXiv preprint arXiv:1706.06551, 2017.
- Gated-attention architectures for task-oriented language grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Zero-shot task generalization with multi-task deep reinforcement learning. In International Conference on Machine Learning, pages 2661–2670. PMLR, 2017.
- Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020, 2019.
- Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6629–6638, 2019.
- Actrce: Augmenting experience via teacher’s advice for multi-goal reinforcement learning. arXiv preprint arXiv:1902.04546, 2019.
- Zpd teaching strategies for deep reinforcement learning from demonstrations. arXiv preprint arXiv:1910.12154, 2019.
- V-d d3qn: the variant of double deep q-learning network with dueling architecture. In 2018 37th Chinese Control Conference (CCC), pages 9130–9135, 2018. doi:10.23919/ChiCC.2018.8483478.
- Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. URL http://arxiv.org/abs/1509.06461.
- Dueling network architectures for deep reinforcement learning. CoRR, abs/1511.06581, 2015. URL http://arxiv.org/abs/1511.06581.
- Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015. URL https://api.semanticscholar.org/CorpusID:205242740.
- Babyai: First steps towards grounded language learning with a human in the loop. CoRR, abs/1810.08272, 2018. URL http://arxiv.org/abs/1810.08272.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Chaitanya Kharyal (3 papers)
- Sai Krishna Gottipati (8 papers)
- Tanmay Kumar Sinha (4 papers)
- Srijita Das (12 papers)
- Matthew E. Taylor (69 papers)