Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents (2402.09205v2)
Abstract: Current LLM-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
- AutoGPT. 2023. Autogpt.
- BabyAGI. 2023. Babyagi.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
- Predicting user intents and satisfaction with dialogue-based conversational recommendations.
- Large language models as tool makers. In The Twelfth International Conference on Learning Representations.
- Chateval: Towards better llm-based evaluators through multi-agent debate.
- Understanding user intent in community question answering. In Proceedings of the 21st international conference on world wide web, pages 823–828.
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system.
- A methodological approach to create interactive art in artificial intelligence. In HCI International 2020 – Late Breaking Papers: Cognition, Learning and Games: 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, page 13–31, Berlin, Heidelberg. Springer-Verlag.
- Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. In The Twelfth International Conference on Learning Representations.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Eva A. C. Bittner Christina Wiethof, Navid Tavanapour. 2021. Implementing an intelligent collaborative agent as teammate in collaborative writing: toward a synergy of humans and ai.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
- Tanmay Gupta and Aniruddha Kembhavi. 2023. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962.
- Reasoning with language model is planning with world model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, Singapore. Association for Computational Linguistics.
- Chatdb: Augmenting llms with databases as their symbolic memory.
- Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on World wide web, pages 471–480.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
- Mistral 7b.
- Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. ArXiv.
- Human-centric research for nlp: Towards a definition and guiding questions.
- Hui-Chi Kuo and Yun-Nung Chen. 2023. Zero-shot prompting for implicit intent prediction and recommendation with commonsense reasoning.
- Camel: Communicative agents for "mind" exploration of large language model society.
- Gitagent: Facilitating autonomous agent with github by tool extension.
- McAuley Julian Majumder Bodhisattwa Prasad. 2023. User-centric natural language processing. UC San Diego Electronic Theses and Dissertations.
- modelcenter. 2023. modelcenter. https://github.com/OpenBMB/ModelCenter.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
- OpenAI. 2022. Chatgpt.
- OpenAI. 2023. Gpt-4 technical report.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22.
- Gorilla: Large language model connected with massive apis.
- Communicative agents for software development.
- Creator: Tool creation for disentangling abstract and concrete reasoning of large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6922–6939.
- Toolink: Linking toolkit creation and using through chain-of-solving on open-source model.
- Tool learning with foundation models.
- Toolllm: Facilitating large language models to master 16000+ real-world apis. In The Twelfth International Conference on Learning Representations.
- Analyzing and characterizing user intent in information-seeking conversations. In The 41st international acm sigir conference on research & development in information retrieval, pages 989–992.
- Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems.
- Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379.
- Sparse hidden-dynamics conditional random fields for user intent understanding. In Proceedings of the 20th international conference on World wide web, pages 7–16.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
- Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Kieran O Sullivan. 2018. Comparing the effectiveness of support vector machines and convolutional neural networks for determining user intent in conversational agents.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Bayes and naive bayes classifier.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.
- Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671.
- Autogen: Enabling next-gen llm applications via multi-agent conversation.
- Ai creativity and the human-ai co-creation model. In Human-Computer Interaction. Theory, Methods and Tools: Thematic Area, HCI 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part I, page 171–190, Berlin, Heidelberg. Springer-Verlag.
- XAgent-Team. 2023. Xagent: An autonomous agent for complex task solving.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864.
- Tree of thoughts: Deliberate problem solving with large language models. In Thirty-seventh Conference on Neural Information Processing Systems.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
- Large language model as autonomous decision maker. arXiv preprint arXiv:2308.12519.
- Proagent: From robotic process automation to agentic process automation.
- Glm-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations.
- Wider and deeper llm networks are fairer llm evaluators. arXiv preprint arXiv:2308.01862.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.