Language-Guided World Models: A Model-Based Approach to AI Control (2402.01695v3)
Abstract: This paper introduces the concept of Language-Guided World Models (LWMs) -- probabilistic models that can simulate environments by reading texts. Agents equipped with these models provide humans with more extensive and efficient control, allowing them to simultaneously alter agent behaviors in multiple tasks via natural verbal communication. In this work, we take initial steps in developing robust LWMs that can generalize to compositionally novel language descriptions. We design a challenging world modeling benchmark based on the game of MESSENGER (Hanjie et al., 2021), featuring evaluation settings that require varying degrees of compositional generalization. Our experiments reveal the lack of generalizability of the state-of-the-art Transformer model, as it offers marginal improvements in simulation quality over a no-text baseline. We devise a more robust model by fusing the Transformer with the EMMA attention mechanism (Hanjie et al., 2021). Our model substantially outperforms the Transformer and approaches the performance of a model with an oracle semantic parsing and grounding capability. To demonstrate the practicality of this model in improving AI safety and transparency, we simulate a scenario in which the model enables an agent to present plans to a human before execution, and to revise plans based on their language feedback.
- Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3674–3683, 2018.
- Natural language communication with robots. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 751–761, 2016.
- SRK Branavan. Learning to win by reading manuals in a monte-carlo framework. Journal of Artificial Intelligence Research, 43:661–704, 2012.
- Can transformers jump around right in natural language? assessing performance transfer from scan. In BlackboxNLP workshop (EMNLP), 2021.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31, 2018.
- Emergent communication with world models. arXiv e-prints, pp. arXiv–2002, 2020.
- Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472, 2011.
- Faith and fate: Limits of transformers on compositionality. In Proceedings of Advances in Neural Information Processing Systems, 2023.
- Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2786–2793. IEEE, 2017.
- World models. arXiv preprint arXiv:1803.10122, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Grounding language to entities and dynamics for generalization in reinforcement learning. In International Conference on Machine Learning, pp. 4051–4062. PMLR, 2021.
- Inducing transformer’s compositional generalization ability via auxiliary sequence prediction tasks. In Proceedings of Empirical Methods in Natural Language Processing, 2021.
- Measuring compositional generalization: A comprehensive method on realistic data. In Proceedings of the International Conference on Learning Representations, 2020.
- Learning to model the world with language. arXiv preprint arXiv:2308.01399, 2023.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Transformers are sample-efficient world models. In Proceedings of the International Conference on Learning Representations, 2023.
- Mapping instructions to actions in 3d environments with visual goal prediction. arXiv preprint arXiv:1809.00786, 2018.
- Grounding language for transfer in deep reinforcement learning. Journal of Artificial Intelligence Research, 63:849–874, 2018.
- Help, anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning. arXiv preprint arXiv:1909.01871, 2019.
- Interactive learning from activity description. In International Conference on Machine Learning, pp. 8096–8108. PMLR, 2021.
- Learning to query internet text for informing reinforcement learning agents. arXiv preprint arXiv:2205.13079, 2022.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Langwm: Language grounded world model. arXiv preprint arXiv:2311.17593, 2023.
- Transformer-based world models are happy with 100k interactions. In Proceedings of the International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=TdBaDGCpjly.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Training language models with language feedback at scale. arXiv preprint arXiv:2303.16755, 2023.
- Jürgen Schmidhuber. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In 1990 IJCNN international joint conference on neural networks, pp. 253–258. IEEE, 1990a.
- Jürgen Schmidhuber. Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments, volume 126. Inst. für Informatik, 1990b.
- Jürgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pp. 222–227, 1991.
- Jürgen Schmidhuber. On learning to think: Algorithmic information theory for novel combinations of reinforcement learning controllers and recurrent neural world models. arXiv preprint arXiv:1511.09249, 2015.
- Show or tell? exploring when (and why) teaching with language outperforms demonstration. Cognition, 232:105326, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Paul J Werbos. Learning how the world works: Specifications for predictive networks in robots and brains. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY, 1987.
- Read and reap the rewards: Learning to play atari with the help of instruction manuals. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023a.
- Spring: Studying papers and reasoning to play games. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Progressively efficient learning. arXiv preprint arXiv:2310.13004, 2023.
- Rtfm: Generalising to new environment dynamics via reading. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJgob6NKvH.
- Silg: The multi-environment symbolic interactive language grounding benchmark. In Neural Information Processing Systems (NeurIPS), 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.