Extend single-vector theoretical limits to multi-vector and other architectures

Extend the sign-rank-based theoretical framework for representational limits—developed for single-vector embedding models with dot-product scoring—to multi-vector retrieval architectures and other non-single-vector settings by establishing analogous lower and upper bounds on the required representation capacity.

Background

The paper’s theoretical results establish lower and upper bounds on representational capacity for single-vector embedding models using sign-rank and related rank notions. These results do not directly apply to other architectures such as multi-vector models.

The authors provide empirical evidence that multi-vector models behave differently on LIMIT but leave a formal extension of the theory to these architectures as future work.

References

Although our experiments provide theoretical insight for the most common type of embedding model (single vector) they do not hold necessarily for other architectures, such as multi-vector models. Although we showed initial empirical results with non-single vector models, we leave it to future work to extend our theoretical connections to these settings.

— On the Theoretical Limitations of Embedding-Based Retrieval (2508.21038 - Weller et al., 28 Aug 2025) in Limitations

Extend single-vector theoretical limits to multi-vector and other architectures

Background

References

Related Problems