Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension (2203.11400v3)

Published 22 Mar 2022 in cs.CL

Abstract: One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language. We use UIT-ViQuAD 2.0 as a benchmark dataset for the challenge on Vietnamese MRC at the Eighth Workshop on Vietnamese Language and Speech Processing (VLSP 2021). This task attracted 77 participant teams from 34 universities and other organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 77.24% in F1-score and 67.43% in Exact Match on the private test set. The Vietnamese MRC systems proposed by the top 3 teams use XLM-RoBERTa, a powerful pre-trained LLM based on the transformer architecture. The UIT-ViQuAD 2.0 dataset motivates researchers to further explore the Vietnamese machine reading comprehension task and related tasks such as question answering, question generation, and natural language inference.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Pavel Braslavski. 2020. Sberquad–russian reading comprehension dataset: Description and analysis. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22-25, 2020, Proceedings, volume 12260, page 3. Springer Nature.
  2. Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879.
  3. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
  4. A span-extraction dataset for chinese machine reading comprehension. arXiv preprint arXiv:1810.07366.
  5. Van Nhan Dang and Le Minh Nguyen. 2021. Uit-megapikachu at vlsp 2021 - vimrc challenge: Using xlm-roberta and filter output for vietnamese machine reading comprehension. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021).
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  8. Fquad: French question answering dataset. arXiv preprint arXiv:2002.06071.
  9. Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1342–1352.
  10. Mrqa 2019 shared task: Evaluating generalization in reading comprehension. arXiv preprint arXiv:1910.09753.
  11. Vc-tus at vlsp 2021 - vimrc challenge: Improving retrospective reader for vietnamese machine reading comprehension. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021).
  12. Fquad2. 0: French question answering and knowing that you know nothing. arXiv preprint arXiv:2109.13209.
  13. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
  14. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  15. The NarrativeQA reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6:317–328.
  16. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark. Association for Computational Linguistics.
  17. ebisu_uit at vlsp 2021 - vimrc challenge: Context-aware answer extraction in vietnamese question answering. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021).
  18. Korquad1. 0: Korean qa dataset for machine reading comprehension. arXiv preprint arXiv:1909.07005.
  19. Conversational machine reading comprehension for vietnamese healthcare texts. In Advances in Computational Collective Intelligence, pages 546–558, Cham. Springer International Publishing.
  20. An experimental study of deep neural network models for vietnamese multiple-choice reading comprehension. In 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE), pages 282–287.
  21. Nguyen Phuc Minh and Tran Hoang Vu. 2021. Hn-bert at vlsp 2021 - vimrc challenge: An empirical study of vietnamese machine reading comprehension with unsupervised context selector and adversarial learning. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021).
  22. Enhancing lexical-based approach with external knowledge for vietnamese multiple-choice machine reading comprehension. IEEE Access, 8:201404–201417.
  23. Multi-stage transfer learning with bertology-based language models for question answering system in vietnamese.
  24. Vireader: A wikipedia-based vietnamese reading comprehension system using transfer learning. Journal of Intelligent & Fuzzy Systems, (Preprint):1–19.
  25. Xlmrqa: Open-domain question answering on vietnamese wikipedia-based textual knowledge source.
  26. A Vietnamese dataset for evaluating machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2595–2605, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  27. New vietnamese corpus for machine reading comprehension of health news articles. arXiv preprint arXiv:2006.11138.
  28. Nhat Duy Nguyen and Phong Nguyen-Thuan Do. 2021. Uitsunwind at vlsp 2021 - vimrc challenge: A simply self-ensemble model for vietnamese machine reading comprehension. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021).
  29. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  30. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  31. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266.
  32. MCTest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 193–203, Seattle, Washington, USA. Association for Computational Linguistics.
  33. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603.
  34. Context-aware answer extraction in question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2418–2428.
  35. What makes reading comprehension questions easier? In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4208–4219.
  36. Evaluation metrics for machine reading comprehension: Prerequisite skills and readability. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 806–817, Vancouver, Canada. Association for Computational Linguistics.
  37. Evaluation metrics for machine reading comprehension: Prerequisite skills and readability. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 806–817.
  38. Benchmarking machine reading comprehension: A psychological perspective. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1592–1612.
  39. Assessing the benchmarking capacity of machine reading comprehension datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8918–8927.
  40. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191–200, Vancouver, Canada. Association for Computational Linguistics.
  41. Nguyen Van Tu and Le Anh Cuong. 2021. A deep learning model of multiple knowledge sources integration for community question answering. VNU Journal of Science: Computer Science and Communication Engineering, 37(1).
  42. F-nlp at vlsp 2021-mrc shared task: Joint learning and ensemble method for vietnamese machine reading comprehension. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021).
  43. Retrospective reader for machine reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14506–14514.
  44. Semeval-2021 task 4: Reading comprehension of abstract meaning. arXiv preprint arXiv:2105.14879.
  45. SemEval-2021 task 4: Reading comprehension of abstract meaning. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 37–50, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kiet Van Nguyen (74 papers)
  2. Son Quoc Tran (7 papers)
  3. Luan Thanh Nguyen (12 papers)
  4. Tin Van Huynh (11 papers)
  5. Son T. Luu (26 papers)
  6. Ngan Luu-Thuy Nguyen (56 papers)
Citations (12)