Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension (2004.14069v2)

Published 29 Apr 2020 in cs.CL and cs.AI

Abstract: Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Fei Yuan (28 papers)
  2. Linjun Shou (53 papers)
  3. Xuanyu Bai (1 paper)
  4. Ming Gong (246 papers)
  5. Yaobo Liang (29 papers)
  6. Nan Duan (172 papers)
  7. Yan Fu (31 papers)
  8. Daxin Jiang (138 papers)
Citations (22)

Summary

We haven't generated a summary for this paper yet.