Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension (2001.05687v5)

Published 16 Jan 2020 in cs.CL

Abstract: Although Vietnamese is the 17th most popular native-speaker language in the world, there are not many research studies on Vietnamese machine reading comprehension (MRC), the task of understanding a text and answering questions about it. One of the reasons is because of the lack of high-quality benchmark datasets for this task. In this work, we construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts which are commonly used for teaching reading comprehension for elementary school pupils. In addition, we propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text. We compare the performance of the proposed model with several baseline lexical-based and neural network-based models. Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model. We also measure human performance on our dataset and find that there is a big gap between machine-model and human performances. This indicates that significant progress can be made on this task. The dataset is freely available on our website for research purposes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kiet Van Nguyen (74 papers)
  2. Khiem Vinh Tran (7 papers)
  3. Son T. Luu (26 papers)
  4. Anh Gia-Tuan Nguyen (13 papers)
  5. Ngan Luu-Thuy Nguyen (56 papers)
Citations (4)