Agent Workflow Memory (2409.07429v1)

Published 11 Sep 2024 in cs.CL

Abstract: Despite the potential of LLM-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

Citations (8)

View on Semantic Scholar

Summary

The paper demonstrates that inducing reusable workflows from past agent experiences significantly enhances task success rates and efficiency in long-horizon web navigation.
It employs a language-model based approach to extract and abstract common sub-routines into generalized workflows, validated through online and offline experiments.
Results on WebArena and Mind2Web benchmarks reveal up to 51.1% improvement over baselines, showcasing robust cross-task and cross-domain generalization.

Agent Workflow Memory: A Formal Overview

The paper "Agent Workflow Memory" by Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig introduces a novel mechanism, Agent Workflow Memory (AWM), to improve the performance and adaptability of LLM-based agents in solving long-horizon digital tasks, particularly web navigation. This essay provides a detailed summary of the method, experiments, and key findings presented in the paper.

Introduction

LLM (LM)-based agents have shown significant potential in automating digital tasks like web navigation. However, these agents struggle with long-horizon tasks that require extended action trajectories due to a lack of robustness and adaptability. Humans excel in such scenarios by learning reusable task workflows from past experiences and applying them to future tasks. Inspired by this human capability, AWM is designed to induce and utilize task workflows to guide LM-based agents.

Methodology

Problem Statement

The authors focus on agents with a LLM backbone and text-based memory, capable of performing tasks specified by natural language (NL) instructions. The agent operates in an environment defined by a transition function, generating actions iteratively until task completion or termination. Experiences are logged as trajectories that include observations and actions, which is fundamental to the workflow induction process.

Workflow Representation

Workflows in AWM comprise:

Workflow Description: An NL summary of the workflow's function.
Workflow Trajectory: A sequence of steps (observations and actions) necessary to complete the workflow.

The representation is designed to generalize across tasks by abstracting example-specific details.

Workflow Induction and Utilization

The core of AWM is the induction module, which extracts workflows from past agent experiences. The induction process can be carried out using different methods, including a LLM (LM)-based approach where the model identifies common sub-routines and abstracts them into workflows. These workflows are then integrated into the agent's memory, which can be accessed during task-solving.

AWM operates in two scenarios:

Offline: Workflows are induced from annotated examples available before inference.
Online: Workflows are induced and adapted on-the-fly during task inference, making AWM flexible even in the absence of training data.

Experiments

The authors evaluate AWM on two major web navigation benchmarks, WebArena and Mind2Web, comparing its performance to state-of-the-art methods.

WebArena

WebArena provides a robust environment for evaluating web navigation tasks across diverse domains. AWM achieved significantly higher success rates compared to baselines like BrowserGym, with up to 51.1% relative improvement. Noteworthy findings include:

Efficiency: AWM reduced the number of steps required for task completion.
Generalization: It demonstrated robust cross-task and cross-template generalization, indicating its capability to adapt to diverse task templates effectively.
Incremental Learning: AWM efficiently built increasingly complex workflows, improving performance as more tasks were completed.

Mind2Web

Mind2Web tests agent generalization in cross-task, cross-website, and cross-domain settings. AWM was evaluated in both offline and online scenarios:

Cross-Task: AWM outperformed baselines in step and task success rates.
Cross-Website and Cross-Domain: AWM demonstrated superior generalization capabilities, bridging the gap between training and test distributions.

Exploring Optimal Workflow Representations

The authors explored different workflow representations:

Sub-routine, Abstract Formats: Consistently high performance with minor differences between rule-based and LM-based methods.
Text vs. Code: Both formats were effective, with slight advantages in different metrics.
Enhanced Observations: Augmenting workflows with concrete HTML states did not significantly improve performance compared to NL descriptions alone.

Workflow Utilization Beyond Memory Augmentation

The paper also examined expanding the agent's action space with workflows, enabling dynamic, pre-determined action sequences. While this approach showed promise, issues such as dynamic environment changes posed challenges, indicating the need for more advanced integration techniques.

The paper situates AWM within the context of web agent benchmarks and efforts to enhance agent capabilities through improved action spaces and memory augmentation. It draws parallels with existing works on learning reusable procedures from experiences, highlighting AWM's contributions to flexible and efficient workflow induction.

Conclusion

AWM presents a significant advancement in improving LM-based agents' adaptability and performance in long-horizon tasks. Through inducing and utilizing reusable workflows, AWM achieves substantial improvements in task success rates and demonstrates strong generalization across diverse digital tasks. Future work could explore more advanced techniques for integrating workflows in dynamic environments and further enhancing workflow induction methods.

Acknowledgments

The authors acknowledge the contributions of several individuals and funding sources, emphasizing the collaborative nature of this research endeavor.

This summary provides an expert overview of the "Agent Workflow Memory" paper, highlighting its methodology, experimental results, and contributions to the field of intelligent agents and web navigation.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gneubig/status/1834224942336081984

https://twitter.com/ecardenas300/status/1835356834279882950

https://twitter.com/A_K_Nain/status/1838407396198060413

https://twitter.com/fly51fly/status/1834341740591067420

https://twitter.com/AkariAsai/status/1918697613840249039

https://twitter.com/cyberandy/status/1836286996911739370

YouTube

Show All Videos