Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (2403.12014v2)

Published 18 Mar 2024 in cs.CL, cs.AI, and cs.LG
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

Abstract: Recent SOTA approaches for embodied learning via interaction directly employ LLMs as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. We first prompt an LLM to generate training environments by giving it the task description and simulator objectives that the agents should learn and then asking it to generate a set of environment configurations (e.g., different terrains, items initially given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We also show that using an LLM to adapt environments dynamically outperforms curriculum learning approaches and how the environments are adapted to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of calls. Lastly, we present detailed ablation studies for EnvGen design choices.

Adaptive Environment Generation with LLMs for Enhanced Training of Embodied Agents

Introduction to the EnvGen Framework

Recent advancements in embodied AI emphasize learning through environmental interaction, a stark departure from traditional dataset-based approaches. Environments offering complex tasks necessitate agents capable of long-horizon planning, a significant challenge for conventional reinforcement learning (RL) paradigms due to sparse reward distributions. This paper introduces EnvGen, a framework leveraging LLMs to dynamically create and adapt training environments for small RL agents. By generating tailored environments aimed at addressing an agent's weaknesses, EnvGen facilitates efficient skill acquisition, particularly for tasks requiring extensive action sequences.

Challenges with Long-horizon Task Learning

Traditional RL agents often stumble when facing tasks that demand sequential achievement unlocking, primarily due to the sparse and delayed rewards inherent to such tasks. LLMs, equipped with extensive world knowledge and sophisticated reasoning capabilities, offer a promising solution yet are hindered by their proclivity for slow, cost-intensive operations when directly employed as agents.

EnvGen: Adaptive Environment Generation

EnvGen circumvents the limitations of direct LLM use by instead leveraging LLMs to generate and adapt training environments. Initiated with a descriptive prompt about the task and simulator capabilities, the LLM proposes a set of environment configurations. An RL agent is trained within these LLM-suggested environments before being evaluated in the original setting. This feedback loop allows for iterative refinement, with the LLM tailoring subsequent environments to specifically bolster the agent’s underdeveloped skills. EnvGen proposes a cost-effective method that significantly reduces the need for direct LLM invocation.

Empirical Validation

The effectiveness of EnvGen is validated through comprehensive experiments within the Crafter and Heist simulation environments. Findings demonstrate that RL agents trained under the EnvGen framework surpass state-of-the-art counterparts, achieving superior performance in complex, long-horizon tasks. Notably, a small RL agent trained with EnvGen manages to outperform a GPT-4-driven agent, highlighting EnvGen's efficiency in leveraging LLM capabilities without incurring prohibitive computational or financial costs.

Theoretical Implications and Practical Applications

The EnvGen framework exemplifies the practical integration of LLMs into RL workflows, deviating from direct usage paradigms. This technique opens new avenues for exploiting LLMs' comprehensive world knowledge and reasoning prowess in a manner that is both computationally and economically viable. The ability of EnvGen to adaptively refine training environments based on agent performance underscores the potential of LLMs in crafting highly specialized, skill-targeted learning contexts.

Future Perspectives in AI Training

EnvGen marks a significant step forward in the symbiotic use of LLMs and RL agents, providing a blueprint for future explorations in adaptive learning environments. As LLMs continue to evolve, their integration into embodied AI training through frameworks like EnvGen could revolutionize our approach to nurturing intelligent, highly capable agents. Future research may explore the extension of this methodology across a broader spectrum of simulation environments, further cementing the role of LLMs in the efficient training of embodied agents.

Conclusion

EnvGen presents a novel approach to leveraging the analytical strengths of LLMs for the advancement of embodied AI. By refocusing the role of LLMs from direct action planning to the generation and adaptation of training environments, EnvGen offers a scalable, efficient method for enhancing RL agent performance. This work paves the way for innovative uses of LLMs in AI training, promising significant improvements in agent learning efficiency and skill acquisition within complex, dynamic environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Delf: Designing learning environments with foundation models. In AAAI Workshop, 2024.
  2. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In CoRL, 2022.
  3. Palm 2 technical report, 2023.
  4. Playing hard exploration games by watching YouTube. In NeurIPS, 2018. URL http://arxiv.org/abs/1805.11592.
  5. Layer Normalization. In NIPS 2016 Deep Learning Symposium, 2016. URL http://arxiv.org/abs/1607.06450.
  6. Unifying count-based exploration and intrinsic motivation. In NIPS, 2016.
  7. Language Models are Few-Shot Learners. In NeurIPS, 2020. URL http://arxiv.org/abs/2005.14165.
  8. Exploration by Random Network Distillation. In ICLR, 2018.
  9. Evaluating large language models trained on code, 2021.
  10. PaLM: Scaling Language Modeling with Pathways. JMLR, pp.  1–83, 2023. URL http://arxiv.org/abs/2204.02311.
  11. Leveraging procedural generation to benchmark reinforcement learning. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  2048–2056. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/cobbe20a.html.
  12. PaLM-E: An Embodied Multimodal Language Model. In ICML 2023, 2023. URL http://arxiv.org/abs/2303.03378.
  13. Guiding Pretraining in Reinforcement Learning with Large Language Models. In ICML, 2023.
  14. A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. doi: 10.1109/TETCI.2022.3141105.
  15. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In ICML, 2018. ISBN 9781510867963.
  16. Gemini Team. Gemini: A family of highly capable multimodal models, 2023.
  17. Generative Adversarial Networks. In NIPS, 2014. ISBN 1406.2661. URL http://arxiv.org/abs/1406.2661.
  18. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024. URL https://arxiv.org/abs/2401.14196.
  19. Danijar Hafner. Benchmarking the spectrum of agent capabilities. In ICLR, 2022. URL https://github.com/danijar/crafter.
  20. Dream to Control: Learning Behaviors by Latent Imagination. In ICLR, 2020.
  21. Mastering Atari with Discrete World Models. In ICLR, 2021.
  22. Mastering Diverse Domains through World Models, 2023. URL http://arxiv.org/abs/2301.04104.
  23. Deep Residual Learning for Image Recognition. In CVPR, 2016.
  24. Rainbow: Combining improvements in deep reinforcement learning. In AAAI, 2018. ISBN 9781577358008. doi: 10.1609/aaai.v32i1.11796.
  25. Large Language Models are Zero-Shot Reasoners. In NeurIPS, 2022. URL http://arxiv.org/abs/2205.11916.
  26. Generating game levels for multiple distinct games with a common latent space. In AIIDE, pp.  109–115, 2020. ISBN 9781577358497. doi: 10.1609/aiide.v16i1.7485.
  27. SCENECRAFT: Automating Interactive Narrative Scene Generation in Digital Games with Large Language Models. In AIIDE, pp.  86–96, 2023. ISBN 157735883X. doi: 10.1609/aiide.v19i1.27504.
  28. Reward design with language models. In International Conference on Learning Representations, 2023.
  29. Exploring long-horizon reasoning with deep RL in combinatorially hard tasks. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022, 2022a. URL https://openreview.net/forum?id=7vPSZASOF0o.
  30. Auto mc-reward: Automated dense reward design with large language models for minecraft. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
  31. Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation. Advances in Neural Information Processing Systems, 2023.
  32. Envedit: Environment editing for vision-and-language navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022b.
  33. Deep learning for procedural content generation. Neural Comput. Appl., 33(1):19–37, jan 2021. ISSN 0941-0643. doi: 10.1007/s00521-020-05383-8. URL https://doi.org/10.1007/s00521-020-05383-8.
  34. Eureka: Human-level reward design via coding large language models. ArXiv, abs/2310.12931, 2023.
  35. Mojang Studios. Minecraft, 2009. URL https://www.minecraft.net/.
  36. Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. In NeurIPS, 2023. URL http://arxiv.org/abs/2307.03486.
  37. Show Your Work: Scratchpads for Intermediate Computation with Language Models, 2021. URL http://arxiv.org/abs/2112.00114.
  38. OpenAI. Gpt-4 technical report. ArXiv, 2023a. URL https://api.semanticscholar.org/CorpusID:257532815.
  39. OpenAI. Chatgpt. https://openai.com/chatgpt, 2023b.
  40. TOAD-GAN: A Flexible Framework for Few-Shot Level Generation in Token-Based Games. IEEE Transactions on Games, 14(2):284–293, 2022. ISSN 24751510. doi: 10.1109/TG.2021.3069833.
  41. Proximal Policy Optimization Algorithms, 2017.
  42. Planning to explore via self-supervisedworld models. In ICML, 2020. ISBN 9781713821120.
  43. Procedural Content Generation in Games. Springer Publishing Company, Incorporated, 1st edition, 2016. ISBN 3319427148.
  44. Learning to generalize with object-centric agents in the open world survival game crafter. IEEE Transactions on Games, 2023.
  45. MarioGPT: Open-Ended Text2Level Generation through Large Language Models. In NeurIPS, 2023. URL http://arxiv.org/abs/2302.05981.
  46. Reinforcement Learning: An Introduction. The MIT Press, 2 edition, 2018.
  47. Level Generation Through Large Language Models. In FDG, 2023. ISBN 9781450398565. doi: 10.1145/3582437.3587211.
  48. Llama: Open and efficient foundation language models, 2023a.
  49. Llama 2: Open foundation and fine-tuned chat models, 2023b.
  50. Investigating the Role of Model-Based Learning in Exploration and Transfer. In ICML, 2023.
  51. Voyager: An Open-Ended Embodied Agent with Large Language Models, 2023a. URL http://arxiv.org/abs/2305.16291.
  52. ByteSized32: A corpus and challenge task for generating task-specific world models expressed as text games. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  13455–13471, Singapore, December 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.830. URL https://aclanthology.org/2023.emnlp-main.830.
  53. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. In NeurIPS, 2023c. URL http://arxiv.org/abs/2302.01560.
  54. JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models, 2023d. URL http://arxiv.org/abs/2311.05997.
  55. Scaling data generation in vision-and-language navigation. In ICCV, 2023e.
  56. Christopher J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, England, May 1989.
  57. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In NeurIPS, pp.  1–43, 2022. URL http://arxiv.org/abs/2201.11903.
  58. Lilian Weng. Exploration strategies in deep reinforcement learning. lilianweng.github.io, Jun 2020. URL https://lilianweng.github.io/posts/2020-06-07-exploration-drl/.
  59. SPRING: Studying the Paper and Reasoning to Play Games. In NeurIPS, 2023. URL http://arxiv.org/abs/2305.15486.
  60. ReAct: Synergizing Reasoning and Acting in Language Models. In ICLR, 2023. URL http://arxiv.org/abs/2210.03629.
  61. Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks. In Foundation Models for Decision Making Workshop at NeurIPS, 2023.
  62. See and Think: Embodied Agent in Virtual Environment, 2023. URL http://arxiv.org/abs/2311.15209.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Abhay Zala (10 papers)
  2. Jaemin Cho (36 papers)
  3. Han Lin (53 papers)
  4. Jaehong Yoon (43 papers)
  5. Mohit Bansal (304 papers)
Citations (4)