Large Language Model-based Human-Agent Collaboration for Complex Task Solving (2402.12914v1)
Abstract: In recent developments within the research community, the integration of LLMs in creating fully autonomous agents has garnered significant interest. Despite this, LLM-based agents frequently demonstrate notable shortcomings in adjusting to dynamic environments and fully grasping human needs. In this work, we introduce the problem of LLM-based human-agent collaboration for complex task-solving, exploring their synergistic potential. In addition, we propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process. We construct a human-agent collaboration dataset to train this policy model in an offline reinforcement learning environment. Our validation tests confirm the model's effectiveness. The results demonstrate that the synergistic efforts of humans and LLM-based agents significantly improve performance in complex tasks, primarily through well-planned, limited human intervention. Datasets and code are available at: https://github.com/XueyangFeng/ReHAC.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Guidelines for human-ai interaction. In Proceedings of the 2019 chi conference on human factors in computing systems, pages 1–13.
- Driving towards inclusion: Revisiting in-vehicle interaction in autonomous vehicles. arXiv preprint arXiv:2401.14571.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Human-in-the-loop through chain-of-thought. arXiv preprint arXiv:2306.07932.
- Seeclick: Harnessing gui grounding for advanced visual gui agents. arXiv preprint arXiv:2401.10935.
- Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 902–909.
- Plug-and-play policy planner for large language model powered dialogue agents. arXiv preprint arXiv:2311.00262.
- Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 910–919.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- Reasoning with language model is planning with world model. In Conference on Empirical Methods in Natural Language Processing.
- Webvoyager: Building an end-to-end web agent with large multimodal models. arXiv preprint arXiv:2401.13919.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
- M. A. Khan et al. 2022. Level-5 autonomous driving—are we there yet? a review of research literature. ACM Computing Surveys, 55(2):Article 27.
- Iglu 2022: Interactive grounded language understanding in a collaborative environment at neurips 2022.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
- Tptu-v2: Boosting task planning and tool usage of large language model-based agents in real-world systems. arXiv preprint arXiv:2311.11315.
- Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
- Agentbench: Evaluating llms as agents. arXiv preprint arXiv: 2308.03688.
- Reft: Reasoning with reinforced fine-tuning. arXiv preprint arXiv:2401.08967.
- Generative expressive robot behaviors using large language models. arXiv preprint arXiv:2401.14673.
- Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
- Improving grounded language understanding in a collaborative environment by interacting with agents through help feedback. arXiv preprint arXiv:2304.10750.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
- Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), UIST ’23, New York, NY, USA. Association for Computing Machinery.
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
- Communicative agents for software development. arXiv preprint arXiv:2307.07924.
- Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789.
- Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning. In 7th Annual Conference on Robot Learning.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- SAE International. 2021. Sae levels of driving automation refined for clarity and international audience. https://www.sae.org/news/2021/05/sae-j3016-driving-automation-levels.
- HuggingGPT: Solving AI tasks with chatGPT and its friends in hugging face. In Advances in Neural Information Processing Systems.
- Reflexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems.
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of the International Conference on Learning Representations (ICLR).
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Large language models for multi-modal human-robot interaction. arXiv preprint arXiv:2401.15174.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.
- Mint: Evaluating llms in multi-turn interaction with tools and language feedback.
- Putting humans in the natural language processing loop: A survey. arXiv preprint arXiv:2103.04044.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256.
- Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems, pages 1–22.
- A survey of human-in-the-loop for machine learning. Future Generation Computer Systems, 135:364–381.
- Lemur: Harmonizing natural language and code for language agents. arXiv preprint arXiv:2310.06830.
- Intercode: Standardizing and benchmarking interactive coding with execution feedback. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600.
- Tree of thoughts: Deliberate problem solving with large language models. In Advances in Neural Information Processing Systems.
- React: Synergizing reasoning and acting in language models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
- Retroformer: Retrospective large language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151.
- Mammoth: Building math generalist models through hybrid instruction tuning. arXiv preprint arXiv:2309.05653.
- Systematic review of research on artificial intelligence applications in higher education–where are the educators? International Journal of Educational Technology in Higher Education, 16(1):1–27.
- Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144.
- A survey of large language models.
- Xueyang Feng (9 papers)
- Zhi-Yuan Chen (2 papers)
- Yujia Qin (41 papers)
- Yankai Lin (125 papers)
- Xu Chen (413 papers)
- Zhiyuan Liu (433 papers)
- Ji-Rong Wen (299 papers)