Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies (2305.15115v1)

Published 24 May 2023 in cs.IR

Abstract: Recently, a new paradigm called Differentiable Search Index (DSI) has been proposed for document retrieval, wherein a sequence-to-sequence model is learned to directly map queries to relevant document identifiers. The key idea behind DSI is to fully parameterize traditional ``index-retrieve'' pipelines within a single neural model, by encoding all documents in the corpus into the model parameters. In essence, DSI needs to resolve two major questions: (1) how to assign an identifier to each document, and (2) how to learn the associations between a document and its identifier. In this work, we propose a Semantic-Enhanced DSI model (SE-DSI) motivated by Learning Strategies in the area of Cognitive Psychology. Our approach advances original DSI in two ways: (1) For the document identifier, we take inspiration from Elaboration Strategies in human learning. Specifically, we assign each document an Elaborative Description based on the query generation technique, which is more meaningful than a string of integers in the original DSI; and (2) For the associations between a document and its identifier, we take inspiration from Rehearsal Strategies in human learning. Specifically, we select fine-grained semantic features from a document as Rehearsal Contents to improve document memorization. Both the offline and online experiments show improved retrieval performance over prevailing baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yubao Tang (6 papers)
  2. Ruqing Zhang (60 papers)
  3. Jiafeng Guo (161 papers)
  4. Jiangui Chen (8 papers)
  5. Zuowei Zhu (1 paper)
  6. Shuaiqiang Wang (68 papers)
  7. Dawei Yin (165 papers)
  8. Xueqi Cheng (274 papers)
Citations (28)