- The paper demonstrates that inducing reusable workflows from past agent experiences significantly enhances task success rates and efficiency in long-horizon web navigation.
- It employs a language-model based approach to extract and abstract common sub-routines into generalized workflows, validated through online and offline experiments.
- Results on WebArena and Mind2Web benchmarks reveal up to 51.1% improvement over baselines, showcasing robust cross-task and cross-domain generalization.
The paper "Agent Workflow Memory" by Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig introduces a novel mechanism, Agent Workflow Memory (AWM), to improve the performance and adaptability of LLM-based agents in solving long-horizon digital tasks, particularly web navigation. This essay provides a detailed summary of the method, experiments, and key findings presented in the paper.
Introduction
LLM (LM)-based agents have shown significant potential in automating digital tasks like web navigation. However, these agents struggle with long-horizon tasks that require extended action trajectories due to a lack of robustness and adaptability. Humans excel in such scenarios by learning reusable task workflows from past experiences and applying them to future tasks. Inspired by this human capability, AWM is designed to induce and utilize task workflows to guide LM-based agents.
Methodology
Problem Statement
The authors focus on agents with a LLM backbone and text-based memory, capable of performing tasks specified by natural language (NL) instructions. The agent operates in an environment defined by a transition function, generating actions iteratively until task completion or termination. Experiences are logged as trajectories that include observations and actions, which is fundamental to the workflow induction process.
Workflow Representation
Workflows in AWM comprise:
- Workflow Description: An NL summary of the workflow's function.
- Workflow Trajectory: A sequence of steps (observations and actions) necessary to complete the workflow.
The representation is designed to generalize across tasks by abstracting example-specific details.
Workflow Induction and Utilization
The core of AWM is the induction module, which extracts workflows from past agent experiences. The induction process can be carried out using different methods, including a LLM (LM)-based approach where the model identifies common sub-routines and abstracts them into workflows. These workflows are then integrated into the agent's memory, which can be accessed during task-solving.
AWM operates in two scenarios:
- Offline: Workflows are induced from annotated examples available before inference.
- Online: Workflows are induced and adapted on-the-fly during task inference, making AWM flexible even in the absence of training data.
Experiments
The authors evaluate AWM on two major web navigation benchmarks, WebArena and Mind2Web, comparing its performance to state-of-the-art methods.
WebArena
WebArena provides a robust environment for evaluating web navigation tasks across diverse domains. AWM achieved significantly higher success rates compared to baselines like BrowserGym, with up to 51.1% relative improvement. Noteworthy findings include:
- Efficiency: AWM reduced the number of steps required for task completion.
- Generalization: It demonstrated robust cross-task and cross-template generalization, indicating its capability to adapt to diverse task templates effectively.
- Incremental Learning: AWM efficiently built increasingly complex workflows, improving performance as more tasks were completed.
Mind2Web
Mind2Web tests agent generalization in cross-task, cross-website, and cross-domain settings. AWM was evaluated in both offline and online scenarios:
- Cross-Task: AWM outperformed baselines in step and task success rates.
- Cross-Website and Cross-Domain: AWM demonstrated superior generalization capabilities, bridging the gap between training and test distributions.
Exploring Optimal Workflow Representations
The authors explored different workflow representations:
- Sub-routine, Abstract Formats: Consistently high performance with minor differences between rule-based and LM-based methods.
- Text vs. Code: Both formats were effective, with slight advantages in different metrics.
- Enhanced Observations: Augmenting workflows with concrete HTML states did not significantly improve performance compared to NL descriptions alone.
Workflow Utilization Beyond Memory Augmentation
The paper also examined expanding the agent's action space with workflows, enabling dynamic, pre-determined action sequences. While this approach showed promise, issues such as dynamic environment changes posed challenges, indicating the need for more advanced integration techniques.
The paper situates AWM within the context of web agent benchmarks and efforts to enhance agent capabilities through improved action spaces and memory augmentation. It draws parallels with existing works on learning reusable procedures from experiences, highlighting AWM's contributions to flexible and efficient workflow induction.
Conclusion
AWM presents a significant advancement in improving LM-based agents' adaptability and performance in long-horizon tasks. Through inducing and utilizing reusable workflows, AWM achieves substantial improvements in task success rates and demonstrates strong generalization across diverse digital tasks. Future work could explore more advanced techniques for integrating workflows in dynamic environments and further enhancing workflow induction methods.
Acknowledgments
The authors acknowledge the contributions of several individuals and funding sources, emphasizing the collaborative nature of this research endeavor.
This summary provides an expert overview of the "Agent Workflow Memory" paper, highlighting its methodology, experimental results, and contributions to the field of intelligent agents and web navigation.