Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Efficient and Effective OpenQA Systems for Low-Resource Languages (2401.03590v2)

Published 7 Jan 2024 in cs.CL

Abstract: Question answering (QA) is the task of answering questions posed in natural language with free-form natural language answers extracted from a given passage. In the OpenQA variant, only a question text is given, and the system must retrieve relevant passages from an unstructured knowledge source and use them to provide answers, which is the case in the mainstream QA systems on the Web. QA systems currently are mostly limited to the English language due to the lack of large-scale labeled QA datasets in non-English languages. In this paper, we show that effective, low-cost OpenQA systems can be developed for low-resource contexts. The key ingredients are (1) weak supervision using machine-translated labeled datasets and (2) a relevant unstructured knowledge source in the target language context. Furthermore, we show that only a few hundred gold assessment examples are needed to reliably evaluate these systems. We apply our method to Turkish as a challenging case study, since English and Turkish are typologically very distinct and Turkish has limited resources for QA. We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources and SQuAD-TR using two versions of Wikipedia dumps spanning two years. We obtain a performance improvement of 24-32% in the Exact Match (EM) score and 22-29% in the F1 score compared to the BM25-based and DPR-based baseline QA reader models. Our results show that SQuAD-TR makes OpenQA feasible for Turkish, which we hope encourages researchers to build OpenQA systems in other low-resource languages. We make all the code, models, and the dataset publicly available at https://github.com/boun-tabi/SQuAD-TR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleParSQuAD: Persian Question Answering Dataset based on Machine Translation of SQuAD 2.0 ParSQuAD: Persian question answering dataset based on machine translation of SQuAD 2.0.\BBCQ \APACjournalVolNumPagesInternational Journal of Web Research4134–46. \PrintBackRefs\CurrentBib
  2. \APACrefYearMonthDay2007. \BBOQ\APACrefatitleZemberek, an open source NLP framework for Turkic languages Zemberek, an open source NLP framework for Turkic languages.\BBCQ \APACjournalVolNumPagesStructure101–5. {APACrefURL} http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.556.69 \APACrefnotehttps://github.com/ahmetaa/zemberek-nlp \PrintBackRefs\CurrentBib
  3. \APACinsertmetastarzemberek-solr-plugin{APACrefauthors}Arslan, A.  \APACrefYearMonthDay2016. \BBOQ\APACrefatitleDeASCIIfication approach to handle diacritics in Turkish information retrieval DeASCIIfication approach to handle diacritics in Turkish information retrieval.\BBCQ \APACjournalVolNumPagesInformation Processing & Management522326–339. {APACrefURL} http://www.sciencedirect.com/science/article/pii/S0306457315001053 {APACrefDOI} http://dx.doi.org/10.1016/j.ipm.2015.08.004 \PrintBackRefs\CurrentBib
  4. \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleOn the Cross-lingual Transferability of Monolingual Representations On the cross-lingual transferability of monolingual representations.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 4623–4637). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.421 {APACrefDOI} 10.18653/v1/2020.acl-main.421 \PrintBackRefs\CurrentBib
  5. \APACrefYearMonthDay2021\APACmonth06. \BBOQ\APACrefatitleXOR QA: Cross-lingual Open-Retrieval Question Answering XOR QA: Cross-lingual open-retrieval question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 547–564). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.naacl-main.46 {APACrefDOI} 10.18653/v1/2021.naacl-main.46 \PrintBackRefs\CurrentBib
  6. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleEmbracing Data Abundance Embracing data abundance.\BBCQ \BIn \APACrefbtitle5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. \APACaddressPublisherICLR. {APACrefURL} https://openreview.net/forum?id=H1U4mhVFe \PrintBackRefs\CurrentBib
  7. \APACrefYearMonthDay2021. \BBOQ\APACrefatitlemMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset mMARCO: A multilingual version of MS MARCO passage ranking dataset.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2108.13897. \PrintBackRefs\CurrentBib
  8. \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleData and Representation for Turkish Natural Language Inference Data and Representation for Turkish Natural Language Inference.\BBCQ \BIn \APACrefbtitleProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (\BPGS 8253–8267). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.emnlp-main.662 {APACrefDOI} 10.18653/v1/2020.emnlp-main.662 \PrintBackRefs\CurrentBib
  9. \APACrefYearMonthDay2020\APACmonth10. \BBOQ\APACrefatitleHow State-Of-The-Art Models Can Deal With Long-Form Question Answering How state-of-the-art models can deal with long-form question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 34th Pacific Asia Conference on Language, Information and Computation Proceedings of the 34th pacific asia conference on language, information and computation (\BPGS 375–382). \APACaddressPublisherHanoi, VietnamAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.paclic-1.43 \PrintBackRefs\CurrentBib
  10. \APACrefYearMonthDay2021August. \BBOQ\APACrefatitleA Review of Public Datasets in Question Answering Research A review of public datasets in question answering research.\BBCQ \APACjournalVolNumPagesSIGIR Forum542. {APACrefURL} https://doi.org/10.1145/3483382.3483389 {APACrefDOI} 10.1145/3483382.3483389 \PrintBackRefs\CurrentBib
  11. \APACrefYearMonthDay2020\APACmonth05. \BBOQ\APACrefatitleAutomatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering Automatic Spanish translation of SQuAD dataset for multi-lingual question answering.\BBCQ \BIn \APACrefbtitleProceedings of the Twelfth Language Resources and Evaluation Conference Proceedings of the twelfth language resources and evaluation conference (\BPGS 5515–5523). \APACaddressPublisherMarseille, FranceEuropean Language Resources Association. {APACrefURL} https://aclanthology.org/2020.lrec-1.677 \PrintBackRefs\CurrentBib
  12. \APACrefYearMonthDay2021. \APACrefbtitleA Survey on non-English Question Answering Dataset. A survey on non-English question answering dataset. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2112.13634 {APACrefDOI} 10.48550/ARXIV.2112.13634 \PrintBackRefs\CurrentBib
  13. \APACrefYearMonthDay2017\APACmonth07. \BBOQ\APACrefatitleReading Wikipedia to Answer Open-Domain Questions Reading Wikipedia to answer open-domain questions.\BBCQ \BIn \APACrefbtitleProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (\BPGS 1870–1879). \APACaddressPublisherVancouver, CanadaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P17-1171 {APACrefDOI} 10.18653/v1/P17-1171 \PrintBackRefs\CurrentBib
  14. \APACrefYearMonthDay2022\APACmonth12. \BBOQ\APACrefatitleSalient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One? Salient phrase aware dense retrieval: Can a dense retriever imitate a sparse one?\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2022 Findings of the association for computational linguistics: Emnlp 2022 (\BPGS 250–262). \APACaddressPublisherAbu Dhabi, United Arab EmiratesAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.findings-emnlp.19 \PrintBackRefs\CurrentBib
  15. \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleUnsupervised Cross-lingual Representation Learning at Scale Unsupervised cross-lingual representation learning at scale.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 8440–8451). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.747 {APACrefDOI} 10.18653/v1/2020.acl-main.747 \PrintBackRefs\CurrentBib
  16. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleCross-lingual Language Model Pretraining Cross-lingual language model pretraining.\BBCQ \BIn H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox\BCBL \BBA R. Garnett (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 32). \APACaddressPublisherCurran Associates, Inc. {APACrefURL} https://proceedings.neurips.cc/paper/2019/file/c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf \PrintBackRefs\CurrentBib
  17. \APACrefYearMonthDay2020. \APACrefbtitleOverview of the TREC 2019 deep learning track. Overview of the TREC 2019 deep learning track. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2003.07820 {APACrefDOI} 10.48550/ARXIV.2003.07820 \PrintBackRefs\CurrentBib
  18. \APACrefYearMonthDay2019\APACmonth11. \BBOQ\APACrefatitleA Span-Extraction Dataset for Chinese Machine Reading Comprehension A span-extraction dataset for Chinese machine reading comprehension.\BBCQ \BIn \APACrefbtitleProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (emnlp-ijcnlp) (\BPGS 5883–5889). \APACaddressPublisherHong Kong, ChinaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D19-1600 {APACrefDOI} 10.18653/v1/D19-1600 \PrintBackRefs\CurrentBib
  19. \APACrefYearMonthDay2019\APACmonth06. \BBOQ\APACrefatitleBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT: Pre-training of deep bidirectional transformers for language understanding.\BBCQ \BIn \APACrefbtitleProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (\BPGS 4171–4186). \APACaddressPublisherMinneapolis, MinnesotaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/N19-1423 {APACrefDOI} 10.18653/v1/N19-1423 \PrintBackRefs\CurrentBib
  20. \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleFQuAD: French Question Answering Dataset FQuAD: French question answering dataset.\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2020 Findings of the association for computational linguistics: Emnlp 2020 (\BPGS 1193–1208). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.findings-emnlp.107 {APACrefDOI} 10.18653/v1/2020.findings-emnlp.107 \PrintBackRefs\CurrentBib
  21. \APACrefYearMonthDay2017. \APACrefbtitleSearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. SearchQA: A new Q&A dataset augmented with context from a search engine. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/1704.05179 {APACrefDOI} 10.48550/ARXIV.1704.05179 \PrintBackRefs\CurrentBib
  22. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleSberQuAD – Russian Reading Comprehension Dataset: Description and Analysis SberQuAD – Russian reading comprehension dataset: Description and analysis.\BBCQ \BIn \APACrefbtitleInternational Conference of the Cross-Language Evaluation Forum for European Languages International conference of the cross-language evaluation forum for European languages (\BPGS 3–15). \PrintBackRefs\CurrentBib
  23. \APACrefYearMonthDay202013–18 Jul. \BBOQ\APACrefatitleRetrieval Augmented Language Model Pre-Training Retrieval augmented language model pre-training.\BBCQ \BIn H\BPBID. III \BBA A. Singh (\BEDS), \APACrefbtitleProceedings of the 37th International Conference on Machine Learning Proceedings of the 37th international conference on machine learning (\BVOL 119, \BPGS 3929–3938). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v119/guu20a.html \PrintBackRefs\CurrentBib
  24. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleThe Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations The Goldilocks Principle: Reading children’s books with explicit memory representations.\BBCQ. \APACrefnotePublisher Copyright: © ICLR 2016: San Juan, Puerto Rico. All Rights Reserved.; 4th International Conference on Learning Representations, ICLR 2016 ; Conference date: 02-05-2016 Through 04-05-2016 \PrintBackRefs\CurrentBib
  25. \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleHoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification HoVer: A dataset for many-hop fact extraction and claim verification.\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2020 Findings of the association for computational linguistics: Emnlp 2020 (\BPGS 3441–3460). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.findings-emnlp.309 {APACrefDOI} 10.18653/v1/2020.findings-emnlp.309 \PrintBackRefs\CurrentBib
  26. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleBillion-scale similarity search with GPUs Billion-scale similarity search with GPUs.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Big Data73535–547. \PrintBackRefs\CurrentBib
  27. \APACrefYearMonthDay2017\APACmonth07. \BBOQ\APACrefatitleTriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension.\BBCQ \BIn \APACrefbtitleProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (\BPGS 1601–1611). \APACaddressPublisherVancouver, CanadaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P17-1147 {APACrefDOI} 10.18653/v1/P17-1147 \PrintBackRefs\CurrentBib
  28. \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleSelective Question Answering under Domain Shift Selective question answering under domain shift.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 5684–5696). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.503 {APACrefDOI} 10.18653/v1/2020.acl-main.503 \PrintBackRefs\CurrentBib
  29. \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleDense Passage Retrieval for Open-Domain Question Answering Dense passage retrieval for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (\BPGS 6769–6781). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.emnlp-main.550 {APACrefDOI} 10.18653/v1/2020.emnlp-main.550 \PrintBackRefs\CurrentBib
  30. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleRelevance-guided Supervision for OpenQA with ColBERT Relevance-guided supervision for OpenQA with ColBERT.\BBCQ \APACjournalVolNumPagesTransactions of the Association for Computational Linguistics9929–944. {APACrefURL} https://aclanthology.org/2021.tacl-1.55 {APACrefDOI} 10.1162/tacl_a_00405 \PrintBackRefs\CurrentBib
  31. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT ColBERT: Efficient and effective passage search via contextualized late interaction over BERT.\BBCQ \BIn \APACrefbtitleProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval (\BPG 39–48). \APACaddressPublisherNew York, NY, USAAssociation for Computing Machinery. {APACrefURL} https://doi.org/10.1145/3397271.3401075 {APACrefDOI} 10.1145/3397271.3401075 \PrintBackRefs\CurrentBib
  32. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleNatural Questions: A Benchmark for Question Answering Research Natural questions: A benchmark for question answering research.\BBCQ \APACjournalVolNumPagesTransactions of the Association for Computational Linguistics7452–466. {APACrefURL} https://aclanthology.org/Q19-1026 {APACrefDOI} 10.1162/tacl_a_00276 \PrintBackRefs\CurrentBib
  33. \APACrefYearMonthDay2022\APACmonth12. \BBOQ\APACrefatitleYou Only Need One Model for Open-domain Question Answering You only need one model for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2022 conference on empirical methods in natural language processing (\BPGS 3047–3060). \APACaddressPublisherAbu Dhabi, United Arab EmiratesAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.emnlp-main.198 \PrintBackRefs\CurrentBib
  34. \APACrefYearMonthDay2019\APACmonth07. \BBOQ\APACrefatitleLatent Retrieval for Weakly Supervised Open Domain Question Answering Latent retrieval for weakly supervised open domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 57th Annual Meeting of the Association for Computational Linguistics Proceedings of the 57th annual meeting of the association for computational linguistics (\BPGS 6086–6096). \APACaddressPublisherFlorence, ItalyAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P19-1612 {APACrefDOI} 10.18653/v1/P19-1612 \PrintBackRefs\CurrentBib
  35. \APACinsertmetastarlevenshtein1966binary{APACrefauthors}Levenshtein, V\BPBII.  \APACrefYearMonthDay1966. \BBOQ\APACrefatitleBinary codes capable of correcting deletions, insertions, and reversals Binary codes capable of correcting deletions, insertions, and reversals.\BBCQ \APACjournalVolNumPagesSoviet physics doklady108707–710. \PrintBackRefs\CurrentBib
  36. \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleMLQA: Evaluating Cross-lingual Extractive Question Answering MLQA: Evaluating cross-lingual extractive question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 7315–7330). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.653 {APACrefDOI} 10.18653/v1/2020.acl-main.653 \PrintBackRefs\CurrentBib
  37. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Retrieval-augmented generation for knowledge-intensive NLP tasks.\BBCQ \BIn \APACrefbtitleProceedings of the 34th International Conference on Neural Information Processing Systems Proceedings of the 34th international conference on neural information processing systems (\BPGS 9459–9474). \APACaddressPublisherRed Hook, NY, USACurran Associates Inc. \PrintBackRefs\CurrentBib
  38. \APACrefYearMonthDay2020. \APACrefbtitlePARADE: Passage Representation Aggregation for Document Reranking. PARADE: Passage representation aggregation for document reranking. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2008.09093 {APACrefDOI} 10.48550/ARXIV.2008.09093 \PrintBackRefs\CurrentBib
  39. \APACrefYearMonthDay2019. \APACrefbtitleKorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension. KorQuAD1.0: Korean QA dataset for machine reading comprehension. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/1909.07005 {APACrefDOI} 10.48550/ARXIV.1909.07005 \PrintBackRefs\CurrentBib
  40. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleToward reproducible baselines: The open-source IR reproducibility challenge Toward reproducible baselines: The open-source IR reproducibility challenge.\BBCQ \BIn \APACrefbtitleEuropean Conference on Information Retrieval European conference on information retrieval (\BPGS 408–420). \PrintBackRefs\CurrentBib
  41. \APACrefYearMonthDay2021. \BBOQ\APACrefatitlePyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations.\BBCQ \BIn \APACrefbtitleProceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) Proceedings of the 44th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2021) (\BPGS 2356–2362). \PrintBackRefs\CurrentBib
  42. \APACrefYearMonthDay2019\APACmonth07. \BBOQ\APACrefatitleXQA: A Cross-lingual Open-domain Question Answering Dataset XQA: A cross-lingual open-domain question answering dataset.\BBCQ \BIn \APACrefbtitleProceedings of the 57th Annual Meeting of the Association for Computational Linguistics Proceedings of the 57th annual meeting of the association for computational linguistics (\BPGS 2358–2368). \APACaddressPublisherFlorence, ItalyAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P19-1227 {APACrefDOI} 10.18653/v1/P19-1227 \PrintBackRefs\CurrentBib
  43. \APACrefYearMonthDay2021\APACmonth11. \BBOQ\APACrefatitleGermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval GermanQuAD and GermanDPR: Improving non-English question answering and passage retrieval.\BBCQ \BIn \APACrefbtitleProceedings of the 3rd Workshop on Machine Reading for Question Answering Proceedings of the 3rd workshop on machine reading for question answering (\BPGS 42–50). \APACaddressPublisherPunta Cana, Dominican RepublicAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.mrqa-1.4 {APACrefDOI} 10.18653/v1/2021.mrqa-1.4 \PrintBackRefs\CurrentBib
  44. \APACrefYearMonthDay2019\APACmonth08. \BBOQ\APACrefatitleNeural Arabic Question Answering Neural Arabic question answering.\BBCQ \BIn \APACrefbtitleProceedings of the Fourth Arabic Natural Language Processing Workshop Proceedings of the fourth arabic natural language processing workshop (\BPGS 108–118). \APACaddressPublisherFlorence, ItalyAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/W19-4612 {APACrefDOI} 10.18653/v1/W19-4612 \PrintBackRefs\CurrentBib
  45. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleMS MARCO: A Human Generated MAchine Reading COmprehension Dataset MS MARCO: A human generated MAchine Reading COmprehension dataset.\BBCQ \BIn T\BPBIR. Besold, A. Bordes, A\BPBIS. d’Avila Garcez\BCBL \BBA G. Wayne (\BEDS), \APACrefbtitleProceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016 Proceedings of the workshop on cognitive computation: Integrating neural and symbolic approaches 2016 co-located with the 30th annual conference on neural information processing systems (NIPS 2016), barcelona, spain, december 9, 2016 (\BVOL 1773). \APACaddressPublisherCEUR-WS.org. {APACrefURL} http://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf \PrintBackRefs\CurrentBib
  46. \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleDocument Ranking with a Pretrained Sequence-to-Sequence Model Document ranking with a pretrained sequence-to-sequence model.\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2020 Findings of the association for computational linguistics: Emnlp 2020 (\BPGS 708–718). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.findings-emnlp.63 {APACrefDOI} 10.18653/v1/2020.findings-emnlp.63 \PrintBackRefs\CurrentBib
  47. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleWabiQA: A Wikipedia-Based Thai Question-Answering System WabiQA: A Wikipedia-Based Thai question-answering system.\BBCQ \APACjournalVolNumPagesInformation Processing & Management581102431. {APACrefURL} https://www.sciencedirect.com/science/article/pii/S0306457320309249 {APACrefDOI} https://doi.org/10.1016/j.ipm.2020.102431 \PrintBackRefs\CurrentBib
  48. \APACrefYearMonthDay2021. \APACrefbtitleThe Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. The Expando-Mono-Duo design pattern for text ranking with pretrained sequence-to-sequence models. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2101.05667 {APACrefDOI} 10.48550/ARXIV.2101.05667 \PrintBackRefs\CurrentBib
  49. \APACrefYearMonthDay2018\APACmonth07. \BBOQ\APACrefatitleKnow What You Don’t Know: Unanswerable Questions for SQuAD Know what you don’t know: Unanswerable questions for SQuAD.\BBCQ \BIn \APACrefbtitleProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: Short papers) (\BPGS 784–789). \APACaddressPublisherMelbourne, AustraliaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P18-2124 {APACrefDOI} 10.18653/v1/P18-2124 \PrintBackRefs\CurrentBib
  50. \APACrefYearMonthDay2016\APACmonth11. \BBOQ\APACrefatitleSQuAD: 100,000+ Questions for Machine Comprehension of Text SQuAD: 100,000+ questions for machine comprehension of text.\BBCQ \BIn \APACrefbtitleProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2016 conference on empirical methods in natural language processing (\BPGS 2383–2392). \APACaddressPublisherAustin, TexasAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D16-1264 {APACrefDOI} 10.18653/v1/D16-1264 \PrintBackRefs\CurrentBib
  51. \APACrefYearMonthDay2013\APACmonth10. \BBOQ\APACrefatitleMCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text MCTest: A challenge dataset for the open-domain machine comprehension of text.\BBCQ \BIn \APACrefbtitleProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2013 conference on empirical methods in natural language processing (\BPGS 193–203). \APACaddressPublisherSeattle, Washington, USAAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D13-1020 \PrintBackRefs\CurrentBib
  52. \APACrefYearMonthDay1995January. \BBOQ\APACrefatitleOkapi at TREC-3 Okapi at TREC-3.\BBCQ \BIn \APACrefbtitleOverview of the Third Text REtrieval Conference (TREC-3) Overview of the third Text REtrieval Conference (trec-3) (\PrintOrdinalOverview of the Third Text REtrieval Conference (TREC–3) \BEd, \BPGS 109–126). \APACaddressPublisherGaithersburg, MD: NIST. {APACrefURL} https://www.microsoft.com/en-us/research/publication/okapi-at-trec-3/ \PrintBackRefs\CurrentBib
  53. \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePLAID: An Efficient Engine for Late Interaction Retrieval PLAID: An efficient engine for late interaction retrieval.\BBCQ \BIn \APACrefbtitleProceedings of the 31st ACM International Conference on Information & Knowledge Management Proceedings of the 31st ACM international conference on information & knowledge management (\BPG 1747–1756). \APACaddressPublisherNew York, NY, USAAssociation for Computing Machinery. {APACrefURL} https://doi.org/10.1145/3511808.3557325 {APACrefDOI} 10.1145/3511808.3557325 \PrintBackRefs\CurrentBib
  54. \APACrefYearMonthDay2022\APACmonth07. \BBOQ\APACrefatitleColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction ColBERTv2: Effective and efficient retrieval via lightweight late interaction.\BBCQ \BIn \APACrefbtitleProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 3715–3734). \APACaddressPublisherSeattle, United StatesAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.naacl-main.272 {APACrefDOI} 10.18653/v1/2022.naacl-main.272 \PrintBackRefs\CurrentBib
  55. \APACinsertmetastarstefan_schweter_2020_3770924{APACrefauthors}Schweter, S.  \APACrefYearMonthDay2020\APACmonth04. \APACrefbtitleBERTurk - BERT models for Turkish. BERTurk - BERT models for Turkish. \APACaddressPublisherZenodo. {APACrefURL} https://doi.org/10.5281/zenodo.3770924 {APACrefDOI} 10.5281/zenodo.3770924 \PrintBackRefs\CurrentBib
  56. \APACrefYearMonthDay2020\APACmonth10. \BBOQ\APACrefatitleTransformers: State-of-the-Art Natural Language Processing Transformers: State-of-the-art natural language processing.\BBCQ \BIn \APACrefbtitleProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Proceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations (\BPGS 38–45). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.emnlp-demos.6 {APACrefDOI} 10.18653/v1/2020.emnlp-demos.6 \PrintBackRefs\CurrentBib
  57. \APACrefYearMonthDay2021\APACmonth06. \BBOQ\APACrefatitlemT5: A Massively Multilingual Pre-trained Text-to-Text Transformer mT5: A massively multilingual pre-trained text-to-text transformer.\BBCQ \BIn \APACrefbtitleProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 483–498). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.naacl-main.41 {APACrefDOI} 10.18653/v1/2021.naacl-main.41 \PrintBackRefs\CurrentBib
  58. \APACrefYearMonthDay2021\APACmonth06. \BBOQ\APACrefatitleDesigning a Minimal Retrieve-and-Read System for Open-Domain Question Answering Designing a minimal retrieve-and-read system for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 5856–5865). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.naacl-main.468 {APACrefDOI} 10.18653/v1/2021.naacl-main.468 \PrintBackRefs\CurrentBib
  59. \APACrefYearMonthDay2015\APACmonth09. \BBOQ\APACrefatitleWikiQA: A Challenge Dataset for Open-Domain Question Answering WikiQA: A challenge dataset for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2015 conference on empirical methods in natural language processing (\BPGS 2013–2018). \APACaddressPublisherLisbon, PortugalAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D15-1237 {APACrefDOI} 10.18653/v1/D15-1237 \PrintBackRefs\CurrentBib
  60. \APACrefYearMonthDay2018\APACmonth10-\APACmonth11. \BBOQ\APACrefatitleHotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering HotpotQA: A dataset for diverse, explainable multi-hop question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2018 conference on empirical methods in natural language processing (\BPGS 2369–2380). \APACaddressPublisherBrussels, BelgiumAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D18-1259 {APACrefDOI} 10.18653/v1/D18-1259 \PrintBackRefs\CurrentBib
  61. \APACrefYearMonthDay2022\APACmonth05. \BBOQ\APACrefatitleKG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering KG-FiD: Infusing knowledge graph in fusion-in-decoder for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: Long papers) (\BPGS 4961–4974). \APACaddressPublisherDublin, IrelandAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.acl-long.340 {APACrefDOI} 10.18653/v1/2022.acl-long.340 \PrintBackRefs\CurrentBib
  62. \APACrefYearMonthDay2021. \APACrefbtitleRetrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. Retrieving and reading: A comprehensive survey on open-domain question answering. \PrintBackRefs\CurrentBib
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Emrah Budur (5 papers)
  2. Rıza Özçelik (9 papers)
  3. Dilara Soylu (6 papers)
  4. Omar Khattab (34 papers)
  5. Tunga Güngör (15 papers)
  6. Christopher Potts (113 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com