Characterize which top-k combination patterns fail for single-vector embeddings

Characterize the classes of top-k document set combinations—equivalently, structural patterns in the binary query relevance matrix—that single-vector embedding models provably fail to represent, by identifying the specific properties that lead to unavoidable failure regardless of training.

Background

While the paper proves that some combinations cannot be represented by single-vector embeddings (via sign-rank arguments), it does not identify which specific patterns or families of combinations cause failure.

The authors explicitly state they cannot prove a priori which types of combinations will fail, leaving open a structural characterization of unrepresentable patterns.

References

We have showed the theoretical connection that proves that some combinations cannot be represented by embedding models, however, we cannot prove apriori which types of combinations they will fail on.

— On the Theoretical Limitations of Embedding-Based Retrieval (2508.21038 - Weller et al., 28 Aug 2025) in Limitations

Characterize which top-k combination patterns fail for single-vector embeddings

Sponsor

Background

References

Related Problems