Vendor-complete characterization of CLI grep vs. vector performance under increasing distraction

Determine, across provider-native CLI harnesses (Claude Code, Codex CLI, and Gemini CLI), how the accuracy of grep-based lexical retrieval changes as the number of distractor sessions increases relative to vector-based semantic retrieval, under matched session-limit configurations (e.g., s5, s10, s20, s30, full) on the LongMemEval subset.

Background

Experiment 2 studies how retrieval performance scales as irrelevant sessions (distractors) are added, comparing grep-only and vector-only retrieval under identical session-limit configurations. While results are reported for Chronos, Claude Code, and Gemini CLI, some Codex CLI rows are missing (no grep scaling row and only the full configuration for vector).

Due to these incomplete rows, the authors state they cannot provide a vendor-complete view of how CLI-based grep performance degrades (or “ages”) with increasing distraction relative to CLI-based vector retrieval. Completing this characterization requires filling the missing Codex CLI configurations to enable a cross-vendor comparison under matched session limits.

References

Finally, incomplete rows (Codex vector intermediates; no Codex grep scaling row yet) mean we cannot yet state a vendor-complete picture of how "CLI grep" ages with distraction relative to "CLI vector" under matched caps.

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search  (2605.15184 - Sen et al., 14 May 2026) in Section 4.2.4 (Experiment 2: Discussion)