Bottleneck-Minimal Indexing for Generative Document Retrieval (2405.10974v2)

Published 12 May 2024 in cs.IR, cs.AI, cs.CL, and cs.LG

Abstract: We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document $x \in X$ is indexed by $t \in T$, and a neural autoregressive model is trained to map queries $Q$ to $T$. GDR can be considered to involve information transmission from documents $X$ to queries $Q$, with the requirement to transmit more bits via the indexes $T$. By applying Shannon's rate-distortion theory, the optimality of indexing can be analyzed in terms of the mutual information, and the design of the indexes $T$ can then be regarded as a {\em bottleneck} in GDR. After reformulating GDR from this perspective, we empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneck-minimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods.

References (57)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Hitesh_LPatel/status/1793037265171554658

https://twitter.com/_reachsumit/status/1792783554415649212

Bottleneck-Minimal Indexing for Generative Document Retrieval (2405.10974v2)

Summary

Related Papers

Tweets