On the Multi-turn Instruction Following for Conversational Web Agents (2402.15057v1)
Abstract: Web agents powered by LLMs have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.
- Context aware query rewriting for text rankers using LLM. CoRR, abs/2308.16753.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. In ICLR 2024.
- Open question answering over tables and text. In ICLR 2021.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- Educhat: A large-scale language model-based chatbot system for intelligent education. CoRR, abs/2308.02773.
- Mind2web: Towards a generalist agent for the web. In NeurIPS 2023.
- PACIFIC: towards proactive conversational question answering over tabular and textual data in finance. In EMNLP 2022, pages 6970–6984.
- Plug-and-play policy planner for large language model powered dialogue agents. In ICLR 2024.
- A real-world webagent with planning, long context understanding, and program synthesis. In ICLR 2024.
- Webvoyager: Building an end-to-end web agent with large multimodal models. CoRR, abs/2401.13919.
- Deberta: decoding-enhanced bert with disentangled attention. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 720–730. ACM.
- Metagpt: Meta programming for multi-agent collaborative framework. In ICLR 2024.
- Recommender AI agent: Integrating large language models for interactive recommendations. CoRR, abs/2308.16505.
- "what’s important here?": Opportunities and challenges of using llms in retrieving information from web interfaces. CoRR, abs/2312.06147.
- Language models can solve computer tasks. In NeurIPS 2023.
- MMCoQA: Conversational question answering over text, tables, and images. In ACL 2022, pages 4220–4231.
- Reinforcement learning on web interfaces using workflow-guided exploration. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
- Agentbench: Evaluating llms as agents. In ICLR 2024.
- Agentboard: An analytical evaluation board of multi-turn llm agents.
- Sahisnu Mazumder and Oriana Riva. 2021. FLIN: A flexible natural language interface for web navigation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2777–2788. Association for Computational Linguistics.
- HybriDialogue: An information-seeking dialogue dataset grounded on tabular and textual data. In Findings of ACL: ACL 2022, pages 481–492.
- Kwaiagents: Generalized information-seeking agent system with large language models. CoRR, abs/2312.04889.
- World of bits: An open-domain platform for web-based agents. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3135–3144. PMLR.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. In NeurIPS 2023.
- Alfworld: Aligning text and embodied environments for interactive learning. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Adaplanner: Adaptive planning from feedback with language models. In NeurIPS 2023.
- Multimodal{qa}: complex question answering over text, tables and images. In ICLR 2021.
- A survey on large language model based autonomous agents. CoRR, abs/2308.11432.
- MINT: evaluating llms in multi-turn interaction with tools and language feedback. In ICLR 2024.
- Michael Wooldridge and Nicholas R Jennings. 1995. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152.
- The rise and potential of large language model based agents: A survey. CoRR, abs/2309.07864.
- Openagents: An open platform for language agents in the wild. CoRR, abs/2310.10634.
- Lemur: Harmonizing natural language and code for language agents. In ICLR 2024.
- Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
- Gpt-4v(ision) is a generalist web agent, if grounded. CoRR, abs/2401.01614.
- Judging llm-as-a-judge with mt-bench and chatbot arena. In NeurIPS 2023.
- Synapse: Leveraging few-shot exemplars for human-level computer control. In ICLR 2024.
- Building emotional support chatbots in the era of llms. CoRR, abs/2308.11584.
- Webarena: A realistic web environment for building autonomous agents.
- TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In ACL/IJCNLP 2021, pages 3277–3287.
- Yang Deng (113 papers)
- Xuan Zhang (182 papers)
- Wenxuan Zhang (75 papers)
- Yifei Yuan (37 papers)
- See-Kiong Ng (103 papers)
- Tat-Seng Chua (359 papers)