Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration (1802.08802v1)

Published 24 Feb 2018 in cs.AI

Abstract: Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to "warm-start" the agent by pre-training it to mimic expert demonstrations, but this is prone to overfitting. Instead, we propose to constrain exploration using demonstrations. From each demonstration, we induce high-level "workflows" which constrain the allowable actions at each time step to be similar to those in the demonstration (e.g., "Step 1: click on a textbox; Step 2: enter some text"). Our exploration policy then learns to identify successful workflows and samples actions that satisfy these workflows. Workflows prune out bad exploration directions and accelerate the agent's ability to discover rewards. We use our approach to train a novel neural policy designed to handle the semi-structured nature of websites, and evaluate on a suite of web tasks, including the recent World of Bits benchmark. We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x.

Authors (5)

Evan Zheran Liu (13 papers)
Kelvin Guu (26 papers)
Panupong Pasupat (27 papers)
Tianlin Shi (6 papers)
Percy Liang (239 papers)

Citations (174)

View on Semantic Scholar

Summary

Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration

The paper "Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration" introduces a novel approach to training reinforcement learning (RL) agents for interacting with web-based tasks. The focus is on addressing the challenge of sparse reward environments, which are typical when tasks involve complex sequences of actions like booking flights or managing emails via web interfaces. The authors propose constraining the exploration space of the RL agent through workflows derived from expert demonstrations, as opposed to traditional approaches which often rely on behavioral cloning. This method significantly enhances sample efficiency, leading to state-of-the-art performance in various web tasks.

Key Contributions

Workflow-Guided Exploration (WGE): The paper presents a method where workflow lattices are induced from demonstrations. These workflows constrain the exploration space to sequences that mostly mirror expert behavior, which prunes unpromising exploration directions and accelerates the agent's capacity to discover effective action sequences.
Novel Neural Policy Architecture: The paper delineates a neural network architecture, specifically \domnet, designed to address the semi-structured nature of web pages. \domnet leverages both structured (HTML) and unstructured inputs (natural language, images) effectively, providing a flexible mechanism for relational reasoning in a web environment.
Sample Efficiency: The proposed WGE method demonstrates a remarkable improvement in sample efficiency over traditional behavioral cloning, achieving over 100x improvements. This is evidenced by fewer demonstrations yielding significantly better performance compared to previous techniques.

Methodology

The authors detail a multi-step approach to implement their workflow-guided exploration:

Workflows Induction: By examining user demonstrations, the RL agent induces workflows, which are sequences of high-level steps guiding permissible actions at each moment. These workflows act as an exploration guide.
Hierarchical Exploration Policy: A workflow exploration policy ( $\pi_w$ ) is proposed where exploration decisions are made in an environment-blind manner. This simulates which workflow to select, and actions are sampled from within these workflows without referring to the environment state, reducing the risk of overfitting.
Replay Buffer for Neural Policy Training: Successful episodes are stored and later used to train a sophisticated neural policy ( $\pi_n$ ), capable of more flexible interactions due to its expressive architecture but isolated from direct exposure to demonstrations during initial training phases.

Evaluation and Results

The paper evaluates the approach across several benchmarks, including MiniWoB and MiniWoB++, and a more realistic Alaska Airlines booking interface. The authors demonstrate that \domnet+WGE not only achieves higher success rates on web tasks but also outperforms prior RL approaches that relied heavily on pixel-based inputs and behavioral cloning with extensive demonstrations. Notably, tasks with increasing complexity, stochastic environments, and natural language goals see substantial improvement using WGE.

Implications and Future Directions

The implications of this research extend into a few domains:

Web-based Automation: Enhancing current capabilities of AI personal assistants to autonomously perform complex tasks on human-readable interfaces.
Sparse Reward Handling: Providing a framework that can be generalized to other domains where reward signals are sparse and traditional exploration methods falter.
Neural Architecture Design: Encouraging future directions in designing hybrid neural architectures that effectively manage semi-structured data common in real-world applications.

Future research may focus on expanding the constraint language for workflows to integrate more directly with natural language inputs or extend into other platforms requiring complex, high-level task decomposition.

In summary, this work represents a significant step in employing RL for web-based tasks by effectively harnessing the strength of guided exploration and advanced neural architectures to grapple with the inherent complexity and sparsity of reward environments found in real-world web interfaces.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - stanfordnlp/wge: Workflow-Guided Exploration: sample-efficient RL agent for web tasks (114 stars)