Go-Browse: Graph-Based Web Exploration

Updated 22 December 2025

Go-Browse is a method for training web agents using explicit graph search to navigate complex, deeply nested web environments.
It systematically expands web graphs by managing exploration frontiers and confirming task feasibility with modules like NavExplorer and FeasibilityChecker.
State-of-the-art results on the WebArena benchmark demonstrate its effective data collection and improved performance on deep navigation tasks.

Go-Browse is a method for training web agents capable of structured exploration across web environments, with an emphasis on scalable, diverse data collection. By reframing agent exploration as an explicit graph search, Go-Browse enables efficient and comprehensive coverage of previously unseen web sites, supporting the development and fine-tuning of web agents that demonstrate improved performance on complex, deeply nested navigation tasks. The methodology is instantiated on the WebArena benchmark, yielding state-of-the-art results for open-weights models in the sub-10B parameter regime (Gandhi et al., 4 Jun 2025).

1. Formal Environment and Problem Setting

The underlying environment is modeled as a deterministic or stochastic transition system. The state space $S$ is composed of the current goal or task description $g$ , a flattened accessibility-tree (DOM snapshot) of the web page, the history of past actions and execution errors, and a list of available browser primitives (e.g., click, fill, scroll, goto). Actions %%%%2%%%% are atomic browser calls such as click(elemID), scroll(x,y), fill(elemID,text), goto(url), send_msg_to_user(msg), report_infeasible(reason), executed via a browser environment simulator. The transition function $T$ is defined as $s_{t+1} = T(s_t, a_t)$ .

Each trajectory $\tau = (s_1, a_1, ..., s_T, a_T)$ is evaluated against the goal $g$ by a binary reward model $R(g, \tau) \in \{0,1\}$ , marking success (1) or failure (0). A central research challenge addressed by Go-Browse is efficient exploration: agents often fail to discover semantically significant or deeply-nested web pages, instead repeating unproductive action sequences when unfamiliar with the environment.

2. Structured Exploration via Graph Construction

Go-Browse operationalizes structured exploration by representing the web environment as a graph $G = (V, E)$ , where nodes $V$ correspond to unique URLs (or page states) and edges $E$ are navigation transitions between them. Data collection proceeds by iteratively expanding this graph:

Graph expansion: At each discovered node $v$ , NavExplorer and PageExplorer modules propose candidate navigation tasks $g_i$ . Tasks that lead from $v$ to a previously unknown URL $v'$ (confirming feasibility via the FeasibilityChecker) yield new edges $e = (v \rightarrow v')$ .
Frontier management: A frontier $F \subseteq V$ contains nodes that have been discovered but not thoroughly explored. Exploration iteratively selects nodes from $F$ by cost heuristics such as minimum depth (breadth-first), i.e., $v \leftarrow \arg\min_{u \in F} \mathrm{depth}(u)$ , or more generally by $v \leftarrow \arg\min_{u \in F} [\mathrm{depth}(u) + h(u)]$ for a learned heuristic $h$ .

The comprehensive Go-Browse algorithm is outlined by the following pseudocode:

procedure Go-Browse(Websites W)
  Initialize dataset D ← ∅; Graph G=(V,E) ← (∅,∅); Frontier F ← ∅
  for each site W_i in W do
    v_root ← root URL of W_i
    V ← V ∪ {v_root};  F ← F ∪ {v_root}
    while F ≠ ∅ do
      v ← SelectAndRemoveFromFrontier(F)
      s_v ← GetCurrentState(v)
      G_nav ← NavExplorer.propose_tasks(s_v)
      G_local ← PageExplorer.propose_tasks(s_v)
      G_prop ← G_nav ∪ G_local
      G_feas ← ∅
      for each task g in G_prop do
        (is_feas, τ_fc, v_new) ← FeasibilityChecker.check_and_collect(g,s_v,R,N_max)
        if is_feas then
          D ← D ∪ {(g, τ_fc)}; G_feas ← G_feas ∪ {g}
          if v_new ∉ V then
            V ← V ∪ {v_new};  E ← E ∪ {(v → v_new)}; F ← F ∪ {v_new}
          end if
        end if
      end for
      for each g in G_feas do
        T_pref  ← Solvers.sample(g, s_v, R, prefixed=True)
        T_unpref ← Solvers.sample(g, v_root, R, prefixed=False)
        D ← D ∪ { (g, τ) | τ ∈ T_pref ∪ T_unpref }
      end for
    end while
  end for
  return D
end procedure

This approach enables systematic graph expansion, facilitating the reuse of exploration knowledge and reducing redundant sampling.

3. Dataset Construction and Composition

Go-Browse was applied to the five self-hosted WebArena domains (Shopping Admin, Shopping, Reddit, Gitlab, Map), targeting 20 distinct URLs per domain (100 in total). The resulting Go-Browse-WA dataset comprises:

Statistic	Value
Successful trajectories	9,504
Unsuccessful trajectories	17,245
Total trajectories	26,749
Successful steps	39,339
Failed steps	157,123
Total steps	196,462
Unique task descriptions	3,422

Of the successful trajectories, 36.6% were sampled by GPT-4o-mini, 33.9% by Claude-3.7-Sonnet, and 29.5% by Qwen-2.5-7B-Instruct.

Task collection included both "prefixed" (starting from a discovered node $v$ ) and "unprefixed" (starting from the root) sampling strategies, supporting downstream model robustness and the ability to bootstrap weaker models, particularly for deeper graph nodes.

4. Model Architecture and Training Regimen

Supervised fine-tuning was conducted on the 7B-parameter Qwen-2.5-7B-Instruct LLM, using only successful (goal, trajectory) pairs. Each instance is prepended by the user goal, followed by a sequence of $(\text{state} \rightarrow \text{action})$ pairs, with the model trained to autoregressively generate action sequences by minimizing cross-entropy loss. No auxiliary losses were included.

Training configuration:

Maximum sequence length: 24K tokens
Batch size: 8 (1 per GPU), gradient accumulation: 4 (effective batch: 32)
Learning rate: $2 \times 10^{-5}$ , Adam optimizer
2 epochs over ~9,504 successful trajectories
Hardware: 8 × H100 GPUs; total ≈ 40 hours

Additionally, a comparable model was trained on the NNetNav-WA dataset (45K steps), establishing a baseline for evaluation.

5. Empirical Evaluation and Comparative Analysis

Performance was assessed on 812 WebArena test tasks within BrowserGym, using binary task completion rates as the metric. Key results are as follows:

Model	Overall	Admin	Shopping	Reddit	Gitlab	Map
GPT-4o-mini	19.3%	19.2%	19.3%	21.1%	20.9%	15.6%
GPT-4o	37.6%	35.7%	32.3%	50.9%	36.7%	37.5%
Claude-3.7-Sonnet	45.4%	37.4%	37.0%	58.8%	52.0%	47.7%
Qwen-2.5-7B-Instruct	8.3%	7.1%	9.4%	7.9%	8.7%	7.8%
NNetNav-7B	18.8%	14.3%	20.3%	23.7%	19.9%	17.2%
Go-Browse-7B	21.7%	25.3%	22.4%	30.7%	15.3%	17.9%

Go-Browse-7B achieves a 21.7% overall success rate, surpassing GPT-4o-mini by 2.4% and NNetNav-7B by 2.9%. Notable observations include:

Tasks with deeper URL-paths (depth $\geq 5$ ) were solved more effectively by Go-Browse-7B.
Prefixed trajectory sampling provided a 20–30% higher success rate on deep-page navigation for Qwen-2.5.
Task diversity is enhanced, reducing the frequency of redundant, shallow navigation episodes relative to NNetNav.

6. Limitations and Prospective Extensions

The current Go-Browse experiments are constrained to the five WebArena domains. Broader generalization would require further data from diverse, real-world sites (e.g., e-commerce, banking, news). Fine-tuning presently leverages only successful traces; incorporating approximately 39K unsuccessful trajectories through auxiliary failure signals or RL-style objectives is suggested as a mechanism for improved robustness. Scaling beyond the 7B parameter threshold and investigating in-context or retrieval-augmented architectures represent logical extensions for bridging the performance gap relative to closed-weight supermodels.

A plausible implication is that the explicit graph-based exploration methodology is effective not only for data efficiency but also for supporting modular task generation and enabling sophisticated downstream agent behavior in complex, multi-hop web environments.

PDF Markdown Chat (Pro)

References (1)

Go-Browse: Training Web Agents with Structured Exploration (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Go-Browse.