Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Span Selection Pre-training for Question Answering (1909.04120v2)

Published 9 Sep 2019 in cs.CL, cs.AI, and cs.LG

Abstract: BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked LLM and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection Pre-Training (SSPT) poses cloze-like training instances, but rather than draw the answer from the model's parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple reading comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Michael Glass (21 papers)
  2. Alfio Gliozzo (28 papers)
  3. Rishav Chakravarti (11 papers)
  4. Anthony Ferritto (10 papers)
  5. Lin Pan (23 papers)
  6. G P Shrivatsa Bhargav (6 papers)
  7. Dinesh Garg (20 papers)
  8. Avirup Sil (45 papers)
Citations (68)

Summary

We haven't generated a summary for this paper yet.