Bounds for approximate retrieval when allowing errors

Develop theoretical bounds on the representation capacity (e.g., embedding dimension or related rank measures) required for single-vector embedding models to succeed when approximate retrieval is acceptable, such as correctly capturing only a majority of the top-k combinations rather than all of them.

Background

The theoretical analysis in the paper focuses on exact representation of the binary query relevance (qrel) matrix, yielding sign-rank-based limits for single-vector models. In practice, retrieval systems may allow some errors or only need to capture most (not all) combinations.

The authors explicitly note they did not provide theory for this approximate setting and call for bounds that quantify the capacity needed when limited errors are acceptable.

References

We also did not show theoretical results for the setting where the user allows some mistakes, e.g. capturing only the majority of the combinations. We leave putting a bound on this scenario to future work and would invite the reader to examine works like \citet{ben2002limitations}.

— On the Theoretical Limitations of Embedding-Based Retrieval (2508.21038 - Weller et al., 28 Aug 2025) in Limitations

Bounds for approximate retrieval when allowing errors

Background

References

Related Problems