On the Multi-turn Instruction Following for Conversational Web Agents (2402.15057v1)

Published 23 Feb 2024 in cs.CL and cs.AI

Abstract: Web agents powered by LLMs have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (41)

Authors (6)

Yang Deng (113 papers)
Xuan Zhang (182 papers)
Wenxuan Zhang (75 papers)
Yifei Yuan (37 papers)
See-Kiong Ng (103 papers)
Tat-Seng Chua (359 papers)

Citations (9)

View on Semantic Scholar

On the Multi-turn Instruction Following for Conversational Web Agents (2402.15057v1)

Related Papers