Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Multi-turn Instruction Following for Conversational Web Agents (2402.15057v1)

Published 23 Feb 2024 in cs.CL and cs.AI

Abstract: Web agents powered by LLMs have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Context aware query rewriting for text rankers using LLM. CoRR, abs/2308.16753.
  2. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In ICLR 2024.
  3. Open question answering over tables and text. In ICLR 2021.
  4. Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
  5. Educhat: A large-scale language model-based chatbot system for intelligent education. CoRR, abs/2308.02773.
  6. Mind2web: Towards a generalist agent for the web. In NeurIPS 2023.
  7. PACIFIC: towards proactive conversational question answering over tabular and textual data in finance. In EMNLP 2022, pages 6970–6984.
  8. Plug-and-play policy planner for large language model powered dialogue agents. In ICLR 2024.
  9. A real-world webagent with planning, long context understanding, and program synthesis. In ICLR 2024.
  10. Webvoyager: Building an end-to-end web agent with large multimodal models. CoRR, abs/2401.13919.
  11. Deberta: decoding-enhanced bert with disentangled attention. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  12. Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 720–730. ACM.
  13. Metagpt: Meta programming for multi-agent collaborative framework. In ICLR 2024.
  14. Recommender AI agent: Integrating large language models for interactive recommendations. CoRR, abs/2308.16505.
  15. "what’s important here?": Opportunities and challenges of using llms in retrieving information from web interfaces. CoRR, abs/2312.06147.
  16. Language models can solve computer tasks. In NeurIPS 2023.
  17. MMCoQA: Conversational question answering over text, tables, and images. In ACL 2022, pages 4220–4231.
  18. Reinforcement learning on web interfaces using workflow-guided exploration. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  19. Agentbench: Evaluating llms as agents. In ICLR 2024.
  20. Agentboard: An analytical evaluation board of multi-turn llm agents.
  21. Sahisnu Mazumder and Oriana Riva. 2021. FLIN: A flexible natural language interface for web navigation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2777–2788. Association for Computational Linguistics.
  22. HybriDialogue: An information-seeking dialogue dataset grounded on tabular and textual data. In Findings of ACL: ACL 2022, pages 481–492.
  23. Kwaiagents: Generalized information-seeking agent system with large language models. CoRR, abs/2312.04889.
  24. World of bits: An open-domain platform for web-based agents. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3135–3144. PMLR.
  25. Reflexion: an autonomous agent with dynamic memory and self-reflection. In NeurIPS 2023.
  26. Alfworld: Aligning text and embodied environments for interactive learning. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  27. Adaplanner: Adaptive planning from feedback with language models. In NeurIPS 2023.
  28. Multimodal{qa}: complex question answering over text, tables and images. In ICLR 2021.
  29. A survey on large language model based autonomous agents. CoRR, abs/2308.11432.
  30. MINT: evaluating llms in multi-turn interaction with tools and language feedback. In ICLR 2024.
  31. Michael Wooldridge and Nicholas R Jennings. 1995. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152.
  32. The rise and potential of large language model based agents: A survey. CoRR, abs/2309.07864.
  33. Openagents: An open platform for language agents in the wild. CoRR, abs/2310.10634.
  34. Lemur: Harmonizing natural language and code for language agents. In ICLR 2024.
  35. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  36. Gpt-4v(ision) is a generalist web agent, if grounded. CoRR, abs/2401.01614.
  37. Judging llm-as-a-judge with mt-bench and chatbot arena. In NeurIPS 2023.
  38. Synapse: Leveraging few-shot exemplars for human-level computer control. In ICLR 2024.
  39. Building emotional support chatbots in the era of llms. CoRR, abs/2308.11584.
  40. Webarena: A realistic web environment for building autonomous agents.
  41. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In ACL/IJCNLP 2021, pages 3277–3287.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yang Deng (113 papers)
  2. Xuan Zhang (182 papers)
  3. Wenxuan Zhang (75 papers)
  4. Yifei Yuan (37 papers)
  5. See-Kiong Ng (103 papers)
  6. Tat-Seng Chua (359 papers)
Citations (9)