Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval (2304.03158v1)

Published 5 Apr 2023 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages. However, it only uses a single auto-encoding pre-task for dense representation pre-training. This study brings multi-view modeling to the contextual masked auto-encoder. Firstly, multi-view representation utilizes both dense and sparse vectors as multi-view representations, aiming to capture sentence semantics from different aspects. Moreover, multiview decoding paradigm utilizes both autoencoding and auto-regressive decoders in representation bottleneck pre-training, aiming to provide both reconstructive and generative signals for better contextual representation pretraining. We refer to this multi-view pretraining method as CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective and robust on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xing Wu (69 papers)
  2. Guangyuan Ma (14 papers)
  3. Peng Wang (832 papers)
  4. Meng Lin (4 papers)
  5. Zijia Lin (43 papers)
  6. Fuzheng Zhang (60 papers)
  7. Songlin Hu (80 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.