Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
This lightning talk examines a counterintuitive finding in modern AI systems: simple lexical search often outperforms sophisticated semantic retrieval when agents interact with tools. The presentation explores how agent harnesses, tool-calling mechanisms, and retrieval strategies interact in complex ways that challenge conventional wisdom about scaling retrieval systems. Through controlled experiments on conversational question answering, the research reveals that the orchestration layer matters as much as the retrieval algorithm itself.Script
When AI agents need to find information, we assume they need sophisticated semantic search. But what if simple grep search works better? This paper challenges that assumption by showing retrieval success depends less on the algorithm and more on how agents orchestrate their tools.
The researchers discovered that agent harness architecture exerts as much influence on retrieval performance as the choice between grep and vector search. The harness controls how queries are generated, how tools are invoked, and crucially, whether results appear inline in context or as files the agent must explicitly read back.
Across every harness and model pair tested, inline grep retrieval uniformly surpassed vector search, often by 10 percentage points or more. But when switching to programmatic file-based delivery, the rankings reshuffled completely. In the worst case, grep accuracy collapsed from 93 percent to just 55 percent, while vector search sometimes pulled ahead.
When the authors flooded the corpus with irrelevant distractor sessions to simulate real-world noise, both retrieval methods showed resilience at first. But the relative rankings were not stable. Vector search often held up better at modest noise levels, yet grep closed or reversed the gap as corpus size grew and span-centric evidence became more critical.
These results carry a sharp implication: you cannot evaluate retrieval in isolation. The harness, the delivery path, and the agent backbone form an inseparable system. Reporting that you used vector search tells us almost nothing without specifying how the agent orchestrates and consumes those results.
Lexical retrieval remains a powerful, often dominant baseline when agents work with literal spans, but only if the orchestration layer supports it. To explore more findings like this and create your own research videos, visit EmergentMind.com.