Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
106 tokens/sec
Gemini 2.5 Pro Premium
53 tokens/sec
GPT-5 Medium
26 tokens/sec
GPT-5 High Premium
27 tokens/sec
GPT-4o
109 tokens/sec
DeepSeek R1 via Azure Premium
91 tokens/sec
GPT OSS 120B via Groq Premium
515 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Benchmarking Deep Search over Heterogeneous Enterprise Data (2506.23139v1)

Published 29 Jun 2025 in cs.CL and cs.AI

Abstract: We present a new benchmark for evaluating Deep Search--a realistic and complex form of retrieval-augmented generation (RAG) that requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources. These include documents, meeting transcripts, Slack messages, GitHub, and URLs, which vary in structure and often contain human-to-human interactions. We build it using a synthetic data pipeline that simulates business workflows across product planning, development, and support stages, generating interconnected content with realistic noise and multi-hop questions with guaranteed ground-truth answers. We release our benchmark with both answerable and unanswerable queries, and retrieval pool of 39,190 enterprise artifacts, enabling fine-grained evaluation of long-context LLM and RAG systems. Our experiments reveal that even the best-performing agentic RAG methods achieve an average performance score of 32.96 on our benchmark. With further analysis, we highlight retrieval as the main bottleneck: existing methods struggle to conduct deep searches and retrieve all necessary evidence. Consequently, they often reason over partial context, leading to significant performance degradation.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube