Infogent: An Agent-Based Framework for Web Information Aggregation (2410.19054v1)

Published 24 Oct 2024 in cs.AI and cs.CL

Abstract: Despite seemingly performant web agents on the task-completion benchmarks, most existing methods evaluate the agents based on a presupposition: the web navigation task consists of linear sequence of actions with an end state that marks task completion. In contrast, our work focuses on web navigation for information aggregation, wherein the agent must explore different websites to gather information for a complex query. We consider web information aggregation from two different perspectives: (i) Direct API-driven Access relies on a text-only view of the Web, leveraging external tools such as Google Search API to navigate the web and a scraper to extract website contents. (ii) Interactive Visual Access uses screenshots of the webpages and requires interaction with the browser to navigate and access information. Motivated by these diverse information access settings, we introduce Infogent, a novel modular framework for web information aggregation involving three distinct components: Navigator, Extractor and Aggregator. Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7% under Direct API-Driven Access on FRAMES, and improves over an existing information-seeking web agent by 4.3% under Interactive Visual Access on AssistantBench.

References (27)

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces a modular framework, Infogent, that integrates a Navigator, Extractor, and Aggregator for advanced web data extraction.
The paper demonstrates significant performance improvements over existing methods, with 6% to 9.3% gains on key benchmarks.
The paper highlights Infogent’s potential to enhance automated information synthesis in complex and visually demanding web environments.

Infogent: An Agent-Based Framework for Web Information Aggregation

The paper "Infogent: An Agent-Based Framework for Web Information Aggregation" introduces Infogent, a modular framework designed to address complex information aggregation tasks on the web. Unlike traditional web navigation tasks that focus on linear sequences leading to predefined goals, Infogent emphasizes exploratory web navigation necessary for comprehensive information gathering. This framework is particularly relevant for tasks that require synthesizing information from multiple sources to answer complex queries.

Framework Overview and Components

Infogent consists of three key components: Navigator, Extractor, and Aggregator, each playing a distinct role in information aggregation. The Navigator is tasked with exploring the web to identify relevant websites. It operates under two distinct information access settings: Direct API-Driven Access and Interactive Visual Access.

Direct API-Driven Access employs a tool-based LLM agent leveraging search APIs and automated scraping tools. In this setting, the Navigator uses tools for searching and extracting information, guiding the agent based on feedback from the Aggregator.
Figure 1: Overview of Infogent under the Direct API Access and Interactive Visual Access settings: The Navigator uses a tool-based LLM and a browser-controlling VLM as the web agent respectively, with the Aggregator's textual feedback guiding further navigation.
Interactive Visual Access mimics human-like browser interactions using a multimodal web agent to navigate visually complex web pages. The Navigator interacts with web interfaces requiring visual understanding and manual-like inputs, such as clicking, typing, and pressing enter, and additionally supports feedback-driven navigation through backtracking mechanisms.
Figure 2: A working example of Infogent. $\mathcal{NG}$ iteratively generates an updated query given feedback from $\mathcal{AG}$ .

The Extractor in both settings is responsible for identifying and extracting relevant content from selected web pages, using LLMs for textual data and multimodal models for screenshot-based data extraction in visual access scenarios.

The Aggregator assesses extracted content, updates the information stack, and provides feedback to the Navigator, enabling adaptive exploration and information synthesis. This feedback-driven interaction ensures dynamic adjustment in aggregation strategies.

Experimental Results

Infogent's efficacy was demonstrated on datasets requiring complex reasoning and multi-document aggregation. The framework outperformed existing state-of-the-art methods for both API-driven and interactive visual access tasks.

Direct API-Driven Access: Infogent achieved a 6% improvement over existing methods in FRAMES and 4.3% on AssistantBench, showcasing its ability to efficiently aggregate diverse information.

(Table 1)

Table 1: Results (in \%) on the Frames dataset for queries with different reasoning types under Direct API-Driven Access setting.

Interactive Visual Access: In tasks from AssistantBench, where visual interaction with the web is crucial, Infogent demonstrated a 9.3% improvement using advanced models, affirming its robustness in handling complex, information-dense webpages.

(Table 2)

Table 2: Accuracy (in \%) on AssistantBench in Interactive Visual Access Setting.

Implications and Future Directions

Infogent represents a significant advance in web navigation technology by addressing the challenges of information aggregation in both text-based and visually complex web environments. It provides a foundation for further research into improving web-based information synthesis, offering potential applications in areas such as automated report generation, comprehensive data collection for analysis, and enhanced search systems.

Future work will explore expanding Infogent's capabilities to handle more diverse and dynamic web environments, improving modular component interoperability, and incorporating real-time learning to adapt to rapidly changing web contexts.

Conclusion

Infogent showcases a novel approach to web information aggregation by using autonomous agents to interact with and extract diverse data from the web. Its modular design offers flexibility and adaptability across different web access settings, suggesting practical applications in varied fields requiring comprehensive data synthesis. Its ability to outperform existing frameworks highlights its potential as a robust tool in complex information aggregation tasks.

PDF Markdown

Follow-up Questions

Related Papers

Authors (6)

Tweets

https://twitter.com/saagnikkk/status/1920901384481669266