Scaling Generative Search to Millions of Passages
Develop scalable generative search methods—particularly numeric ID-based approaches such as Differentiable Search Index (DSI)—that can operate effectively on corpora containing millions of passages, achieving competitive retrieval performance and efficiency relative to dual-encoder baselines.
References
It is found that while generative search is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
                — A Survey of Generative Search and Recommendation in the Era of Large Language Models
                
                (2404.16924 - Li et al., 25 Apr 2024) in Section 4.2, Document Identifiers (Numeric ID)