AutoWebGLM: A Large Language Model-based Web Navigating Agent (2404.03648v2)

Published 4 Apr 2024 in cs.CL

Abstract: LLMs have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web. In light of these challenges, we develop the open AutoWebGLM based on ChatGLM3-6B. AutoWebGLM can serve as a powerful automated web navigation agent that outperform GPT-4. Inspired by human browsing patterns, we first design an HTML simplification algorithm to represent webpages with vital information preserved succinctly. We then employ a hybrid human-AI method to build web browsing data for curriculum training. Finally, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient task decomposition by itself. For comprehensive evaluation, we establish a bilingual benchmark -- AutoWebBench -- for real-world web navigation tasks. We evaluate AutoWebGLM across diverse web navigation benchmarks, demonstrating its potential to tackle challenging tasks in real environments. Related code, model, and data are released at \url{https://github.com/THUDM/AutoWebGLM}.

PDF HTML Abstract

AutoWebGLM: Innovations and Evaluations in AI-Powered Web Navigation Agents

Introduction to AutoWebGLM

The development of AutoWebGLM introduces a significant enhancement in web navigation agent capabilities, employing the ChatGLM3-6B model as its backbone. This agent surpasses previous benchmarks, including GPT-4, in automated web navigation by embracing a tailored approach to webpage understanding and interaction. The model's unique contributions lie in its sophisticated handling of the complexities associated with web navigation, including diverse action spaces, HTML simplification for efficient processing, and the generation of high-quality training trajectories.

Challenges Addressed

AutoWebGLM's design directly confronts the primary hurdles in web navigation automation:

Unified Action Space: It establishes a comprehensive action space that enables seamless interactions across a plethora of websites.
HTML Simplification: By implementing an algorithm that condenses HTML content while preserving essential information, AutoWebGLM ensures the model's operability under the constraint of token length.
High-quality Training Trajectories: Through a combination of model-assisted and manual annotation methods, AutoWebGLM generates a dataset conducive to training robust web navigating agents capable of accurate inference and error correction.

Methodological Insights

The foundation of AutoWebGLM lies in its methodological innovations:

HTML Representation: The system incorporates an HTML simplification algorithm inspired by human web browsing patterns, significantly reducing the complexity and verbosity of webpages for model comprehension.
Hybrid Human-AI Data Construction: This approach enables the rapid assembly of a rich dataset that the model uses for training, refining its understanding of web operations and decisions.
Curriculum Learning and Reinforcement Approaches: Sequential training strategies involving curriculum learning, reinforcement learning (RL), and rejection sampling finetuning (RFT) are employed to progressively enhance the model's performance across various stages of web interaction.

Dataset and Benchmark Development

A notable contribution of AutoWebGLM's development is the construction of AutoWebBench, a bilingual (English and Chinese) benchmark that addresses the need for comprehensive evaluation tools in web navigation research. This benchmark is designed to assess an agent's performance in navigating and interacting with real-world webpages, offering insights into the practical applicability of AI-powered web agents.

Empirical Evaluations and Findings

Extensive testing of AutoWebGLM across multiple benchmarks, including the newly developed AutoWebBench, reveals its superior performance compared to existing LLM-based web navigating agents. The model demonstrates not only significant improvements in various web navigation tasks but also highlights areas for further research and development.

Performance Metrics: AutoWebGLM achieves high success rates across diverse web navigation benchmarks, showcasing its robustness and versatility.
Challenges in Real-World Navigation: Despite its achievements, AutoWebGLM's performance also underlines the complexity of real-world web navigation and the need for ongoing enhancements in model training and environmental understanding.

Concluding Remarks

The introduction of AutoWebGLM marks a pivotal advancement in the field of AI-powered web navigation. By addressing fundamental challenges and integrating innovative training methodologies, AutoWebGLM sets a new standard for the development of intelligent web navigating agents. The AutoWebBench benchmark further enriches research resources, paving the way for future innovations in AI-driven web interactions. As web navigation continues to evolve, AutoWebGLM represents a significant step forward in harnessing the potential of LLMs to navigate the vast expanse of the internet effectively.