Building Efficient and Effective OpenQA Systems for Low-Resource Languages (2401.03590v2)
Abstract: Question answering (QA) is the task of answering questions posed in natural language with free-form natural language answers extracted from a given passage. In the OpenQA variant, only a question text is given, and the system must retrieve relevant passages from an unstructured knowledge source and use them to provide answers, which is the case in the mainstream QA systems on the Web. QA systems currently are mostly limited to the English language due to the lack of large-scale labeled QA datasets in non-English languages. In this paper, we show that effective, low-cost OpenQA systems can be developed for low-resource contexts. The key ingredients are (1) weak supervision using machine-translated labeled datasets and (2) a relevant unstructured knowledge source in the target language context. Furthermore, we show that only a few hundred gold assessment examples are needed to reliably evaluate these systems. We apply our method to Turkish as a challenging case study, since English and Turkish are typologically very distinct and Turkish has limited resources for QA. We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources and SQuAD-TR using two versions of Wikipedia dumps spanning two years. We obtain a performance improvement of 24-32% in the Exact Match (EM) score and 22-29% in the F1 score compared to the BM25-based and DPR-based baseline QA reader models. Our results show that SQuAD-TR makes OpenQA feasible for Turkish, which we hope encourages researchers to build OpenQA systems in other low-resource languages. We make all the code, models, and the dataset publicly available at https://github.com/boun-tabi/SQuAD-TR.
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleParSQuAD: Persian Question Answering Dataset based on Machine Translation of SQuAD 2.0 ParSQuAD: Persian question answering dataset based on machine translation of SQuAD 2.0.\BBCQ \APACjournalVolNumPagesInternational Journal of Web Research4134–46. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2007. \BBOQ\APACrefatitleZemberek, an open source NLP framework for Turkic languages Zemberek, an open source NLP framework for Turkic languages.\BBCQ \APACjournalVolNumPagesStructure101–5. {APACrefURL} http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.556.69 \APACrefnotehttps://github.com/ahmetaa/zemberek-nlp \PrintBackRefs\CurrentBib
- \APACinsertmetastarzemberek-solr-plugin{APACrefauthors}Arslan, A. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleDeASCIIfication approach to handle diacritics in Turkish information retrieval DeASCIIfication approach to handle diacritics in Turkish information retrieval.\BBCQ \APACjournalVolNumPagesInformation Processing & Management522326–339. {APACrefURL} http://www.sciencedirect.com/science/article/pii/S0306457315001053 {APACrefDOI} http://dx.doi.org/10.1016/j.ipm.2015.08.004 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleOn the Cross-lingual Transferability of Monolingual Representations On the cross-lingual transferability of monolingual representations.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 4623–4637). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.421 {APACrefDOI} 10.18653/v1/2020.acl-main.421 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021\APACmonth06. \BBOQ\APACrefatitleXOR QA: Cross-lingual Open-Retrieval Question Answering XOR QA: Cross-lingual open-retrieval question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 547–564). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.naacl-main.46 {APACrefDOI} 10.18653/v1/2021.naacl-main.46 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleEmbracing Data Abundance Embracing data abundance.\BBCQ \BIn \APACrefbtitle5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. \APACaddressPublisherICLR. {APACrefURL} https://openreview.net/forum?id=H1U4mhVFe \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitlemMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset mMARCO: A multilingual version of MS MARCO passage ranking dataset.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2108.13897. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleData and Representation for Turkish Natural Language Inference Data and Representation for Turkish Natural Language Inference.\BBCQ \BIn \APACrefbtitleProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (\BPGS 8253–8267). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.emnlp-main.662 {APACrefDOI} 10.18653/v1/2020.emnlp-main.662 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth10. \BBOQ\APACrefatitleHow State-Of-The-Art Models Can Deal With Long-Form Question Answering How state-of-the-art models can deal with long-form question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 34th Pacific Asia Conference on Language, Information and Computation Proceedings of the 34th pacific asia conference on language, information and computation (\BPGS 375–382). \APACaddressPublisherHanoi, VietnamAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.paclic-1.43 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021August. \BBOQ\APACrefatitleA Review of Public Datasets in Question Answering Research A review of public datasets in question answering research.\BBCQ \APACjournalVolNumPagesSIGIR Forum542. {APACrefURL} https://doi.org/10.1145/3483382.3483389 {APACrefDOI} 10.1145/3483382.3483389 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth05. \BBOQ\APACrefatitleAutomatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering Automatic Spanish translation of SQuAD dataset for multi-lingual question answering.\BBCQ \BIn \APACrefbtitleProceedings of the Twelfth Language Resources and Evaluation Conference Proceedings of the twelfth language resources and evaluation conference (\BPGS 5515–5523). \APACaddressPublisherMarseille, FranceEuropean Language Resources Association. {APACrefURL} https://aclanthology.org/2020.lrec-1.677 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \APACrefbtitleA Survey on non-English Question Answering Dataset. A survey on non-English question answering dataset. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2112.13634 {APACrefDOI} 10.48550/ARXIV.2112.13634 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017\APACmonth07. \BBOQ\APACrefatitleReading Wikipedia to Answer Open-Domain Questions Reading Wikipedia to answer open-domain questions.\BBCQ \BIn \APACrefbtitleProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (\BPGS 1870–1879). \APACaddressPublisherVancouver, CanadaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P17-1171 {APACrefDOI} 10.18653/v1/P17-1171 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022\APACmonth12. \BBOQ\APACrefatitleSalient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One? Salient phrase aware dense retrieval: Can a dense retriever imitate a sparse one?\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2022 Findings of the association for computational linguistics: Emnlp 2022 (\BPGS 250–262). \APACaddressPublisherAbu Dhabi, United Arab EmiratesAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.findings-emnlp.19 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleUnsupervised Cross-lingual Representation Learning at Scale Unsupervised cross-lingual representation learning at scale.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 8440–8451). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.747 {APACrefDOI} 10.18653/v1/2020.acl-main.747 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleCross-lingual Language Model Pretraining Cross-lingual language model pretraining.\BBCQ \BIn H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox\BCBL \BBA R. Garnett (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 32). \APACaddressPublisherCurran Associates, Inc. {APACrefURL} https://proceedings.neurips.cc/paper/2019/file/c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \APACrefbtitleOverview of the TREC 2019 deep learning track. Overview of the TREC 2019 deep learning track. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2003.07820 {APACrefDOI} 10.48550/ARXIV.2003.07820 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019\APACmonth11. \BBOQ\APACrefatitleA Span-Extraction Dataset for Chinese Machine Reading Comprehension A span-extraction dataset for Chinese machine reading comprehension.\BBCQ \BIn \APACrefbtitleProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (emnlp-ijcnlp) (\BPGS 5883–5889). \APACaddressPublisherHong Kong, ChinaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D19-1600 {APACrefDOI} 10.18653/v1/D19-1600 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019\APACmonth06. \BBOQ\APACrefatitleBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT: Pre-training of deep bidirectional transformers for language understanding.\BBCQ \BIn \APACrefbtitleProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (\BPGS 4171–4186). \APACaddressPublisherMinneapolis, MinnesotaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/N19-1423 {APACrefDOI} 10.18653/v1/N19-1423 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleFQuAD: French Question Answering Dataset FQuAD: French question answering dataset.\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2020 Findings of the association for computational linguistics: Emnlp 2020 (\BPGS 1193–1208). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.findings-emnlp.107 {APACrefDOI} 10.18653/v1/2020.findings-emnlp.107 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \APACrefbtitleSearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. SearchQA: A new Q&A dataset augmented with context from a search engine. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/1704.05179 {APACrefDOI} 10.48550/ARXIV.1704.05179 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleSberQuAD – Russian Reading Comprehension Dataset: Description and Analysis SberQuAD – Russian reading comprehension dataset: Description and analysis.\BBCQ \BIn \APACrefbtitleInternational Conference of the Cross-Language Evaluation Forum for European Languages International conference of the cross-language evaluation forum for European languages (\BPGS 3–15). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay202013–18 Jul. \BBOQ\APACrefatitleRetrieval Augmented Language Model Pre-Training Retrieval augmented language model pre-training.\BBCQ \BIn H\BPBID. III \BBA A. Singh (\BEDS), \APACrefbtitleProceedings of the 37th International Conference on Machine Learning Proceedings of the 37th international conference on machine learning (\BVOL 119, \BPGS 3929–3938). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v119/guu20a.html \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleThe Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations The Goldilocks Principle: Reading children’s books with explicit memory representations.\BBCQ. \APACrefnotePublisher Copyright: © ICLR 2016: San Juan, Puerto Rico. All Rights Reserved.; 4th International Conference on Learning Representations, ICLR 2016 ; Conference date: 02-05-2016 Through 04-05-2016 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleHoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification HoVer: A dataset for many-hop fact extraction and claim verification.\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2020 Findings of the association for computational linguistics: Emnlp 2020 (\BPGS 3441–3460). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.findings-emnlp.309 {APACrefDOI} 10.18653/v1/2020.findings-emnlp.309 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleBillion-scale similarity search with GPUs Billion-scale similarity search with GPUs.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Big Data73535–547. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017\APACmonth07. \BBOQ\APACrefatitleTriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension.\BBCQ \BIn \APACrefbtitleProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (\BPGS 1601–1611). \APACaddressPublisherVancouver, CanadaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P17-1147 {APACrefDOI} 10.18653/v1/P17-1147 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleSelective Question Answering under Domain Shift Selective question answering under domain shift.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 5684–5696). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.503 {APACrefDOI} 10.18653/v1/2020.acl-main.503 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleDense Passage Retrieval for Open-Domain Question Answering Dense passage retrieval for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (\BPGS 6769–6781). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.emnlp-main.550 {APACrefDOI} 10.18653/v1/2020.emnlp-main.550 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleRelevance-guided Supervision for OpenQA with ColBERT Relevance-guided supervision for OpenQA with ColBERT.\BBCQ \APACjournalVolNumPagesTransactions of the Association for Computational Linguistics9929–944. {APACrefURL} https://aclanthology.org/2021.tacl-1.55 {APACrefDOI} 10.1162/tacl_a_00405 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT ColBERT: Efficient and effective passage search via contextualized late interaction over BERT.\BBCQ \BIn \APACrefbtitleProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval (\BPG 39–48). \APACaddressPublisherNew York, NY, USAAssociation for Computing Machinery. {APACrefURL} https://doi.org/10.1145/3397271.3401075 {APACrefDOI} 10.1145/3397271.3401075 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleNatural Questions: A Benchmark for Question Answering Research Natural questions: A benchmark for question answering research.\BBCQ \APACjournalVolNumPagesTransactions of the Association for Computational Linguistics7452–466. {APACrefURL} https://aclanthology.org/Q19-1026 {APACrefDOI} 10.1162/tacl_a_00276 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022\APACmonth12. \BBOQ\APACrefatitleYou Only Need One Model for Open-domain Question Answering You only need one model for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2022 conference on empirical methods in natural language processing (\BPGS 3047–3060). \APACaddressPublisherAbu Dhabi, United Arab EmiratesAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.emnlp-main.198 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019\APACmonth07. \BBOQ\APACrefatitleLatent Retrieval for Weakly Supervised Open Domain Question Answering Latent retrieval for weakly supervised open domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 57th Annual Meeting of the Association for Computational Linguistics Proceedings of the 57th annual meeting of the association for computational linguistics (\BPGS 6086–6096). \APACaddressPublisherFlorence, ItalyAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P19-1612 {APACrefDOI} 10.18653/v1/P19-1612 \PrintBackRefs\CurrentBib
- \APACinsertmetastarlevenshtein1966binary{APACrefauthors}Levenshtein, V\BPBII. \APACrefYearMonthDay1966. \BBOQ\APACrefatitleBinary codes capable of correcting deletions, insertions, and reversals Binary codes capable of correcting deletions, insertions, and reversals.\BBCQ \APACjournalVolNumPagesSoviet physics doklady108707–710. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth07. \BBOQ\APACrefatitleMLQA: Evaluating Cross-lingual Extractive Question Answering MLQA: Evaluating cross-lingual extractive question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 58th Annual Meeting of the Association for Computational Linguistics Proceedings of the 58th annual meeting of the association for computational linguistics (\BPGS 7315–7330). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.acl-main.653 {APACrefDOI} 10.18653/v1/2020.acl-main.653 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Retrieval-augmented generation for knowledge-intensive NLP tasks.\BBCQ \BIn \APACrefbtitleProceedings of the 34th International Conference on Neural Information Processing Systems Proceedings of the 34th international conference on neural information processing systems (\BPGS 9459–9474). \APACaddressPublisherRed Hook, NY, USACurran Associates Inc. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \APACrefbtitlePARADE: Passage Representation Aggregation for Document Reranking. PARADE: Passage representation aggregation for document reranking. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2008.09093 {APACrefDOI} 10.48550/ARXIV.2008.09093 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \APACrefbtitleKorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension. KorQuAD1.0: Korean QA dataset for machine reading comprehension. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/1909.07005 {APACrefDOI} 10.48550/ARXIV.1909.07005 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleToward reproducible baselines: The open-source IR reproducibility challenge Toward reproducible baselines: The open-source IR reproducibility challenge.\BBCQ \BIn \APACrefbtitleEuropean Conference on Information Retrieval European conference on information retrieval (\BPGS 408–420). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitlePyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations.\BBCQ \BIn \APACrefbtitleProceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) Proceedings of the 44th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2021) (\BPGS 2356–2362). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019\APACmonth07. \BBOQ\APACrefatitleXQA: A Cross-lingual Open-domain Question Answering Dataset XQA: A cross-lingual open-domain question answering dataset.\BBCQ \BIn \APACrefbtitleProceedings of the 57th Annual Meeting of the Association for Computational Linguistics Proceedings of the 57th annual meeting of the association for computational linguistics (\BPGS 2358–2368). \APACaddressPublisherFlorence, ItalyAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P19-1227 {APACrefDOI} 10.18653/v1/P19-1227 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021\APACmonth11. \BBOQ\APACrefatitleGermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval GermanQuAD and GermanDPR: Improving non-English question answering and passage retrieval.\BBCQ \BIn \APACrefbtitleProceedings of the 3rd Workshop on Machine Reading for Question Answering Proceedings of the 3rd workshop on machine reading for question answering (\BPGS 42–50). \APACaddressPublisherPunta Cana, Dominican RepublicAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.mrqa-1.4 {APACrefDOI} 10.18653/v1/2021.mrqa-1.4 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019\APACmonth08. \BBOQ\APACrefatitleNeural Arabic Question Answering Neural Arabic question answering.\BBCQ \BIn \APACrefbtitleProceedings of the Fourth Arabic Natural Language Processing Workshop Proceedings of the fourth arabic natural language processing workshop (\BPGS 108–118). \APACaddressPublisherFlorence, ItalyAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/W19-4612 {APACrefDOI} 10.18653/v1/W19-4612 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleMS MARCO: A Human Generated MAchine Reading COmprehension Dataset MS MARCO: A human generated MAchine Reading COmprehension dataset.\BBCQ \BIn T\BPBIR. Besold, A. Bordes, A\BPBIS. d’Avila Garcez\BCBL \BBA G. Wayne (\BEDS), \APACrefbtitleProceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016 Proceedings of the workshop on cognitive computation: Integrating neural and symbolic approaches 2016 co-located with the 30th annual conference on neural information processing systems (NIPS 2016), barcelona, spain, december 9, 2016 (\BVOL 1773). \APACaddressPublisherCEUR-WS.org. {APACrefURL} http://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth11. \BBOQ\APACrefatitleDocument Ranking with a Pretrained Sequence-to-Sequence Model Document ranking with a pretrained sequence-to-sequence model.\BBCQ \BIn \APACrefbtitleFindings of the Association for Computational Linguistics: EMNLP 2020 Findings of the association for computational linguistics: Emnlp 2020 (\BPGS 708–718). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.findings-emnlp.63 {APACrefDOI} 10.18653/v1/2020.findings-emnlp.63 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleWabiQA: A Wikipedia-Based Thai Question-Answering System WabiQA: A Wikipedia-Based Thai question-answering system.\BBCQ \APACjournalVolNumPagesInformation Processing & Management581102431. {APACrefURL} https://www.sciencedirect.com/science/article/pii/S0306457320309249 {APACrefDOI} https://doi.org/10.1016/j.ipm.2020.102431 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \APACrefbtitleThe Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. The Expando-Mono-Duo design pattern for text ranking with pretrained sequence-to-sequence models. \APACaddressPublisherarXiv. {APACrefURL} https://arxiv.org/abs/2101.05667 {APACrefDOI} 10.48550/ARXIV.2101.05667 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018\APACmonth07. \BBOQ\APACrefatitleKnow What You Don’t Know: Unanswerable Questions for SQuAD Know what you don’t know: Unanswerable questions for SQuAD.\BBCQ \BIn \APACrefbtitleProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: Short papers) (\BPGS 784–789). \APACaddressPublisherMelbourne, AustraliaAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/P18-2124 {APACrefDOI} 10.18653/v1/P18-2124 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016\APACmonth11. \BBOQ\APACrefatitleSQuAD: 100,000+ Questions for Machine Comprehension of Text SQuAD: 100,000+ questions for machine comprehension of text.\BBCQ \BIn \APACrefbtitleProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2016 conference on empirical methods in natural language processing (\BPGS 2383–2392). \APACaddressPublisherAustin, TexasAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D16-1264 {APACrefDOI} 10.18653/v1/D16-1264 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2013\APACmonth10. \BBOQ\APACrefatitleMCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text MCTest: A challenge dataset for the open-domain machine comprehension of text.\BBCQ \BIn \APACrefbtitleProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2013 conference on empirical methods in natural language processing (\BPGS 193–203). \APACaddressPublisherSeattle, Washington, USAAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D13-1020 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay1995January. \BBOQ\APACrefatitleOkapi at TREC-3 Okapi at TREC-3.\BBCQ \BIn \APACrefbtitleOverview of the Third Text REtrieval Conference (TREC-3) Overview of the third Text REtrieval Conference (trec-3) (\PrintOrdinalOverview of the Third Text REtrieval Conference (TREC–3) \BEd, \BPGS 109–126). \APACaddressPublisherGaithersburg, MD: NIST. {APACrefURL} https://www.microsoft.com/en-us/research/publication/okapi-at-trec-3/ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePLAID: An Efficient Engine for Late Interaction Retrieval PLAID: An efficient engine for late interaction retrieval.\BBCQ \BIn \APACrefbtitleProceedings of the 31st ACM International Conference on Information & Knowledge Management Proceedings of the 31st ACM international conference on information & knowledge management (\BPG 1747–1756). \APACaddressPublisherNew York, NY, USAAssociation for Computing Machinery. {APACrefURL} https://doi.org/10.1145/3511808.3557325 {APACrefDOI} 10.1145/3511808.3557325 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022\APACmonth07. \BBOQ\APACrefatitleColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction ColBERTv2: Effective and efficient retrieval via lightweight late interaction.\BBCQ \BIn \APACrefbtitleProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 3715–3734). \APACaddressPublisherSeattle, United StatesAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.naacl-main.272 {APACrefDOI} 10.18653/v1/2022.naacl-main.272 \PrintBackRefs\CurrentBib
- \APACinsertmetastarstefan_schweter_2020_3770924{APACrefauthors}Schweter, S. \APACrefYearMonthDay2020\APACmonth04. \APACrefbtitleBERTurk - BERT models for Turkish. BERTurk - BERT models for Turkish. \APACaddressPublisherZenodo. {APACrefURL} https://doi.org/10.5281/zenodo.3770924 {APACrefDOI} 10.5281/zenodo.3770924 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020\APACmonth10. \BBOQ\APACrefatitleTransformers: State-of-the-Art Natural Language Processing Transformers: State-of-the-art natural language processing.\BBCQ \BIn \APACrefbtitleProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Proceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations (\BPGS 38–45). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2020.emnlp-demos.6 {APACrefDOI} 10.18653/v1/2020.emnlp-demos.6 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021\APACmonth06. \BBOQ\APACrefatitlemT5: A Massively Multilingual Pre-trained Text-to-Text Transformer mT5: A massively multilingual pre-trained text-to-text transformer.\BBCQ \BIn \APACrefbtitleProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 483–498). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.naacl-main.41 {APACrefDOI} 10.18653/v1/2021.naacl-main.41 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021\APACmonth06. \BBOQ\APACrefatitleDesigning a Minimal Retrieve-and-Read System for Open-Domain Question Answering Designing a minimal retrieve-and-read system for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (\BPGS 5856–5865). \APACaddressPublisherOnlineAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2021.naacl-main.468 {APACrefDOI} 10.18653/v1/2021.naacl-main.468 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015\APACmonth09. \BBOQ\APACrefatitleWikiQA: A Challenge Dataset for Open-Domain Question Answering WikiQA: A challenge dataset for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2015 conference on empirical methods in natural language processing (\BPGS 2013–2018). \APACaddressPublisherLisbon, PortugalAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D15-1237 {APACrefDOI} 10.18653/v1/D15-1237 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018\APACmonth10-\APACmonth11. \BBOQ\APACrefatitleHotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering HotpotQA: A dataset for diverse, explainable multi-hop question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Proceedings of the 2018 conference on empirical methods in natural language processing (\BPGS 2369–2380). \APACaddressPublisherBrussels, BelgiumAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/D18-1259 {APACrefDOI} 10.18653/v1/D18-1259 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022\APACmonth05. \BBOQ\APACrefatitleKG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering KG-FiD: Infusing knowledge graph in fusion-in-decoder for open-domain question answering.\BBCQ \BIn \APACrefbtitleProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: Long papers) (\BPGS 4961–4974). \APACaddressPublisherDublin, IrelandAssociation for Computational Linguistics. {APACrefURL} https://aclanthology.org/2022.acl-long.340 {APACrefDOI} 10.18653/v1/2022.acl-long.340 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \APACrefbtitleRetrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. Retrieving and reading: A comprehensive survey on open-domain question answering. \PrintBackRefs\CurrentBib
- Emrah Budur (5 papers)
- Rıza Özçelik (9 papers)
- Dilara Soylu (6 papers)
- Omar Khattab (34 papers)
- Tunga Güngör (15 papers)
- Christopher Potts (113 papers)