ADAM: An Embodied Causal Agent in Open-World Environments (2410.22194v1)
Abstract: In open-world environments like Minecraft, existing agents face challenges in continuously learning structured knowledge, particularly causality. These challenges stem from the opacity inherent in black-box models and an excessive reliance on prior knowledge during training, which impair their interpretability and generalization capability. To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning. ADAM is empowered by four key components: 1) an interaction module, enabling the agent to execute actions while documenting the interaction processes; 2) a causal model module, tasked with constructing an ever-growing causal graph from scratch, which enhances interpretability and diminishes reliance on prior knowledge; 3) a controller module, comprising a planner, an actor, and a memory pool, which uses the learned causal graph to accomplish tasks; 4) a perception module, powered by multimodal LLMs, which enables ADAM to perceive like a human player. Extensive experiments show that ADAM constructs an almost perfect causal graph from scratch, enabling efficient task decomposition and execution with strong interpretability. Notably, in our modified Minecraft games where no prior knowledge is available, ADAM maintains its performance and shows remarkable robustness and generalization capability. ADAM pioneers a novel paradigm that integrates causal methods and embodied agents in a synergistic manner. Our project page is at https://opencausalab.github.io/ADAM.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Cassell, J. Embodied conversational interface agents. Communications of the ACM, 43(4):70–78, 2000.
- Interventions and causal inference. Philosophy of science, 74(5):981–995, 2007.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362, 2022.
- Causal reinforcement learning using observational and interventional data. 2021.
- Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
- The minerl 2019 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:1904.10079, 2019.
- The minerl 2020 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:2101.11071, 2021.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Minerl diamond 2021 competition: Overview, results, and lessons learned. NeurIPS 2021 Competitions and Demonstrations Track, pp. 13–28, 2022.
- Learning neural causal models from unknown interventions. CoRR, abs/1910.01075, 2019. URL http://arxiv.org/abs/1910.01075.
- Juewu-mc: Playing minecraft with sample-efficient hierarchical reinforcement learning. arXiv preprint arXiv:2112.04907, 2021.
- Visual instruction tuning. Advances in neural information processing systems, 36, 2024.
- Seihai: A sample-efficient hierarchical ai for the minerl competition. In Distributed Artificial Intelligence: Third International Conference, DAI 2021, Shanghai, China, December 17–18, 2021, Proceedings 3, pp. 38–51. Springer, 2022.
- Causal based q-learning. Res. Comput. Sci., 149(3):95–104, 2020.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Mining learning and crafting scientific experiments: a literature review on the use of minecraft in education and research. Journal of Educational Technology & Society, 19(2):355–366, 2016.
- Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 26311–26325. PMLR, 2023. URL https://proceedings.mlr.press/v202/nottingham23a.html.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–22, 2023.
- Pearl, J. Causality. Cambridge university press, 2009.
- Causality-driven hierarchical structure discovery for reinforcement learning. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/7e9fbd01b3084956dd8a070c7bf30bad-Abstract-Conference.html.
- Causal discovery with continuous additive noise models. 2014.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- PrismarineJS. Prismarinejs/mineflayer, 2023a. URL https://github.com/PrismarineJS/mineflayer. https://github.com/PrismarineJS/mineflayer.
- PrismarineJS. Prismarinejs/prismarine-viewer, 2023b. URL https://github.com/PrismarineJS/prismarine-viewer. https://github.com/PrismarineJS/prismarine-viewer.
- Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023a.
- Mp5: A multi-modal open-ended embodied system in minecraft via active perception. arXiv preprint arXiv:2312.07472, 2023b.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9339–9347, 2019.
- Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2024.
- Schölkopf, B. Causality for machine learning. In Probabilistic and Causal Inference: The Works of Judea Pearl, pp. 765–804. 2022.
- Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems, 34:22905–22918, 2021.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems, 36, 2024.
- Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- Significant Gravitas. AutoGPT. URL https://github.com/Significant-Gravitas/AutoGPT.
- Causation, prediction, and search. MIT press, 2000.
- Causation, prediction, and search. MIT press, 2001.
- A kernel-based causal learning algorithm. In Proceedings of the 24th international conference on Machine learning, pp. 855–862, 2007.
- Model-based transfer reinforcement learning based on graphical model representations. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. doi: 10.48550/ARXIV.2302.13971. URL https://doi.org/10.48550/arXiv.2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. doi: 10.48550/ARXIV.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023b.
- JARVIS-1: open-world multi-task agents with memory-augmented multimodal language models. CoRR, abs/2311.05997, 2023c. doi: 10.48550/ARXIV.2311.05997. URL https://doi.org/10.48550/arXiv.2311.05997.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. CoRR, abs/2302.01560, 2023d. doi: 10.48550/ARXIV.2302.01560. URL https://doi.org/10.48550/arXiv.2302.01560.
- Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022, 2022a. URL https://openreview.net/forum?id=yzkSU5zdwD.
- Chain-of-thought prompting elicits reasoning in large language models. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022b. URL http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9068–9079, 2018.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=WE_vluYUL-X.
- Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
- A survey on causal reinforcement learning. arXiv preprint arXiv:2302.05209, 2023.
- On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 647–655, 2009.
- Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 804–813, 2011.
- Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
- Causal-learn: Causal discovery in python. Journal of Machine Learning Research, 25(60):1–8, 2024.
- Causal discovery with reinforcement learning. In International Conference on Learning Representations, 2019.
- Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.