Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval (2308.11474v1)

Published 22 Aug 2023 in cs.IR

Abstract: Grounded on pre-trained LLMs (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging aspect information for multi-aspect retrieval is to introduce an auxiliary classification objective, i.e., using item contents to predict the annotated value IDs of item aspects. However, by learning the value embeddings from scratch, this approach may not capture the various semantic similarities between the values sufficiently. To address this limitation, we leverage the aspect information as text strings rather than class IDs during pre-training so that their semantic similarities can be naturally captured in the PLMs. To facilitate effective retrieval with the aspect strings, we propose mutual prediction objectives between the text of the item aspect and content. In this way, our model makes more sufficient use of aspect information than conducting undifferentiated masked LLMing (MLM) on the concatenated text of aspects and content. Extensive experiments on two real-world datasets (product and mini-program search) show that our approach can outperform competitive baselines both treating aspect values as classes and conducting the same MLM for aspect and content strings. Code and related dataset will be available at the URL \footnote{https://github.com/sunxiaojie99/ATTEMPT}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xiaojie Sun (6 papers)
  2. Keping Bi (41 papers)
  3. Jiafeng Guo (161 papers)
  4. Xinyu Ma (49 papers)
  5. Fan Yixing (1 paper)
  6. Hongyu Shan (2 papers)
  7. Qishen Zhang (7 papers)
  8. Zhongyi Liu (19 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.