Effective Fuzzy Retrieval under Semantic Ambiguity

Develop methods that enable effective fuzzy retrieval under semantic ambiguity in open‑web settings, allowing LLM-based search agents to reliably retrieve the single most relevant webpage for vague, multi-constraint queries such as those evaluated by the Needle in the Web benchmark.

Background

The paper introduces Needle in the Web, a benchmark focused on fuzzy exploratory search where agents must retrieve a single webpage satisfying multiple vague, masked criteria drawn from real web content. Unlike multi-hop factoid QA benchmarks, this task emphasizes semantic alignment under ambiguity and full-webpage retrieval.

Empirical evaluations across leading closed- and open-source LLM-based agents show low overall accuracy and inconsistent performance across domains and difficulty levels. These results underscore a gap between current retrieval capabilities and the demands of fuzzy exploratory search, motivating the explicit open problem stated in the abstract.

References

These findings reveal that Needle in the Web presents a significant challenge for current search systems and highlights the open problem of effective fuzzy retrieval under semantic ambiguity.

Needle in the Web: A Benchmark for Retrieving Targeted Web Pages in the Wild (2512.16553 - Wang et al., 18 Dec 2025) in Abstract