Bounds for approximate retrieval when allowing errors
Develop theoretical bounds on the representation capacity (e.g., embedding dimension or related rank measures) required for single-vector embedding models to succeed when approximate retrieval is acceptable, such as correctly capturing only a majority of the top-k combinations rather than all of them.
References
We also did not show theoretical results for the setting where the user allows some mistakes, e.g. capturing only the majority of the combinations. We leave putting a bound on this scenario to future work and would invite the reader to examine works like \citet{ben2002limitations}.
                — On the Theoretical Limitations of Embedding-Based Retrieval
                
                (2508.21038 - Weller et al., 28 Aug 2025) in Limitations